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Fig.3B. 
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Fig.3C. 
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Fig.3E. 
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Fig.7C. 
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Fig.7F. 
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Fig. 12. 
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RgygRglBLE EHBEDDED WAVELgT SYSTEM IKPLEHEWTATION 



The present invention relates to the field of data compression and 
decompression systems; particularly, the present invention relates to a 
method and apparatus for lossless and lossy encoding and decoding of 
data in compression/decompression systems. 



• -c ,^ PvtremelY useful tool for storing and 
Data compression is an extrcmeij 

»cnfdata For example, the time required to 
transmitting large amounts of data, for examp 
t^ar^mit an image, such as a facsimile transndssion of a document^^ 

.educed really .hen compression is used to deaease the number of 

bits required to recreate the image. 



3 

Many different data compression techniques exist in the prior art 
Compression techniques can be divided into two broad categories, lossy 
coding and lossless coding. Lossy coding involves coding that results in 
the loss of information, such that there is no guarantee of perfect 
5 reconstruction of the original data. The goal of lossy compression is that 
changes to the original data are done in such a way that they are not 
objectionable or detectable. In lossless compression, all the information 
is retained and the data is compressed in a manner which allows for 
perfect reconstruction. 

10 In lossless compression, input symbols or intensity data are 

converted to output codewords. The input may include image, audio, 
one-dimeT\sional (e.g., data changing spatially or temporally), two- 
dimensional (e.g., data changing in two spatial directions (or one spatial 
and one temporal dimer^ion)), or multi-dimensional /muJti-spectral 

1 5 data. If the compression is successful, tfie codewords are represented in 
fewer bits than the number of bits required for the uncoded ii^ut 
symbols (or intensity data). Lossless coding methods include dictior\aiy 
methods of coding (e.g., Lempel-Ziv), run length encoding, enumerative 
coding and entropy coding. In lossless image compression, compression 

20 is based on predictions or contexts, plus coding. The JBIG standard for 
facsimile compression (ISO/IEC 11544) and DPCM (differential pulse code 
modulation • an option in flie JPEG standard (ISO/IEC 10918)) for 
continuous-tone images are examples of lossless compression for images. 



5 



In lossy compression, input symbols or intensity data are quantized prior 
to conversion to output codewords. Quantization is intended to preserve 
relevant characteristics of the data while eliminating unimportant 
characteristics. Prior to quantization, lossy compression system often use 
5 a transform to provide energy compaction. JPEG is an example of a lossy 
coding method for image data. 

Recent developments in image signal processing continue to focus 
attention on a need for effident and accurate forms of data compression 
coding. Variom forms of transform or pyramidal signal processing have 

1 0 been proposed, including multi-resolution pyramidal processing and 
wavelet pyramidal processing. Ihese forms are also referred to as 
subband processing and hierarchical processing. Wavelet pyramidal 
processing of image data is a specific type of multi-resolution pyramidal 
processing that may use quadrature mirror filters (QMFs) to produce 

15 subband decomposition of an original image. Note that other types of 
non-QMF wavelets exist For more ii\formation on wavelet processing, 
see Antonini, M., et al.. Image Coding Using Wavelet Transform", IEEE 
T r^nc^rtfnr.; ot ^ T"^'>f«' Vrne^smf. VoL 1, No. 2, April 1992; Shapiro, J., 
"An Embedded Hierarchical Image Coder Using Zerotrees of Wavelet 

20 Coefficients", TFFF Da^ * rnmnrpssion Conference, pgs. 214-223, 1993. 
For information on reversible transforms, see Said, A. and Pearlman, W. 
"Reversible Image Compression via Multiresolution Representation and 
Predictive Coding*, Dept of Electrical, Computer and Systems • 



Engineering, Renssealaer Polytechnic Institute, Troy, NY 1993. 

Compression is often very time consuming 2nd memory 
intensive. It is desirable to perform compression faster and/or with 
reduced memory when possible. Some applications have never used 
compression because either the quality could not be assured, the 
compression rate was not high enou^, or flie data rate was not 
controllable. However, the use of compression is desirable to reduce the 
amount of information to be transferred and/or stored. 

Digital copiers, printers, scanners and multifunction machines are 
greatly enhanced with a frame store. A compressed frame store reduces, 
memory and thus the costs required for a frame store in these products. 
However, many frame stores are implemented with random access 
memories (RAMs). RAM is fast but generally expensive. Hard disks may 
also be used as memories, and are generally considered inexpensive (or 
less expensive generally than RAM). Therefore, any system 
manufacturer would find an advantage in producing a lesser expensive 
system using a hard disk, for purposes such as a frame store, instead of 
RAM. 

One problem with using hard disks for time sensitive applications 
is that it is difficult to directly access information from a hard disk as fast 
as the same information could be accessed from a RAM. Also, many 
hard disks utilize compression when storing information onto the disk 
to inacase the amount of infomiation that may be stored onto the disk. 



-6- 



The time necessary .0 perfonn ihe compression may also be a deterrent 
to,:singharddisfein.i»esensitiv.appli=fions. Both a>edow speed 
faAerent i„ (he «se of W diste ™J the use of compression »d=e 
^ffizing hard diste in time sensitive applications edifficuK ■ 

implementation issue. 

^^A^ a method comprising &e steps ot 

eiviainsaeoe«cico...o.^-^^ 
scndijiE the most impormtdatt"""""" 

eoiing the less imporml Jao a"!" 

of signaling bits- 

'the present invention sets forth system implementations 
^..-permit usage of ine.pens.ve hard dis. tedmology insteadof 

.cWn. to a hard disk and for using compression to matd. the h«d 
r-^tchmgtoahaia rf system irr5.1ementation. sud. 

disk to bandwidths of other portions of fl-e system p 

..eprintengir. P'-''"-'^- '"rTthe 
Ire tbetimeto compress and decompress isnot much slower t^*e 

^Mspeed. :nt«s«ay.*ep-entinventionperformsra.ematchms 
to RAM- 
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A method zj\d apparatus fbiperfoTnung compression and/or 
decompression is described. In one embodiment, the present invention 
comprises a system having a buffer, a v^avelet transfomi imit, and a 
5 coder. The wavelet transform tmit has an input coupled to the birffer to 
pcrfonn a wavelet transform on pixels stored therein and to generate 
coefficients at an output. The coder is coupled to the wavelet transform 
unit to code the transformed pixels received from the buffer. 



6 



10 



The present invention will be understood more fully from the 
detailed description given below and from the accompanying dra^^nngs of 
various embodiments of the invention, which, however, should not be 
taken to limit the invention to the specific embodiments, but are for 
explanation and understanding only. 

Figure lA shows the context dependent relationships. Children 
are conditioned on their parents. 

Figure 2A illustrates an order that is similar to raster order. 

Figure 2B illustrates an alternative embodiment of an order, 
which is referred to herein as the short seam order. 

15 

Figure 2C shows an alternative short seam order. 

Figure 3A through 3H illustrate Ae result of each appUcation of 
the TS-txansform filter for a four level transform on a wavelet tree of ti.e 
20 present invcntioru 

Figure 4A is a block diagram of one embodiment of a 
fo^v^•^rd/inverse filter unit for use in implementing the one • 



dimensional filters. 
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Figure 4B is a block diagram of one embodiment of a first level 
forvi^ard transform according to the present invention. 

Figure 5 is a block diagram of one embodiment of a complete 
forward transform according to the present invention. 

Figure 6 is a timing diagram of when coefficients are output. 

Figure 7A through 7H show the results (outputs) of each one 
dimensional fihering operation for the TT-transfoim. 

Figure 8 is a block diagram of a 10 tap forward /inverse filter imit. 

Figiirc 9 is a block diagram of one embodiment of the overlap unit 
for the forward/inverse filter of Figure 8. 

Figure 10 illustrates the ordering of the codestream and the 
ordering within a coding unit. 

Figure 11 illustrates the bit depths of the various coefficients in a 
two-level TS-transfonn and 7T-trai\sform decomposition from an input 



image with b bits per pixel 

Figure 12 is one embodimerit of the multipliers for the frequency 
band used for coefficient alignment in the present invention. 

Figure 13A shows a coefficient divided into most important data 
and less important data. 

Figure 13B shows the lossless case where no data is discarded. 

Figure 13C shows the case where one bitplane of data has been 
discarded (i.e., Q=2) because discarding a bitplane is equivalent to 
division by 2. 

Figure 14 is a flow chart illustrating one embodiment of the 
operation of the compression/decompression system. 

Figure 15 shows one embodiment where 6 bits are used for each 

tree. 

Figure 16 is a flow chart for coding tfie most important dimJc 



Figure 17 is a block diagram of one embodiment of the formatting 



11 

unit and context model iised during the most important data coding pass. 



Figure 18 illustrates one embodiment of a first bitplane imiL 

5 Figure 19 is a flow chart illustrating one embodiment of the 

process of coding a UC bitplane. 

Figure 20 is a block diagram of one embodiment of the look-ahead 
and context models for less important data. 

10 

Figure 21 is a block diagram of one embodiment of the context 
model which provides the conditiorung for head bits. 

Figure 22 illustrates the memory usage for one embodiment of the 
1 5 context model with conditioning on all neighbors and parents. 

Figure 23 is a block diagram of one embodiment of the context 
model for sign bits. 

20 Figure 24 illustrates one embodiment of parallel coding for the 

UC 

Figure 25 is a block diagram of one embodiment of the front end of 



a printer. 
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Figure 26 is a block diagram of one embodiment of the back end of 
the printer. 

5 

Figure 27 is a block diagram of an alternate embodiment of the 
backend of the printer. 

Figure 28 is a block diagram of one embodiment of an integrated 
10 circuit (IQ chip containing the printer compression/decompression. 

Figure 29 illustrates the basic timing of the system during printing. 

Figure 30 illustrates one possible embodiment of how pixel data is 
15 orgariized. 

Figure 31 illustrates a band buffer of a page. 

Figixrc 32 illustrates a taming diagram of decoding fluit iUiistrates 
20 concurrent memory access requirements. 

Figure 33 show's how drcular addressing can be used to handle 
writing data that is larger than the data read. 



13 



Figure 34 illiistrates an encoder and decoder pair- 
Figure 35 shows illustrates one cnibodiment of a binary context 
5 model. 

^ Figure 36 illustrates an alternate embodiment of a binary context 

model. 

10 Figure 37 shows the neighborhood coefficients for every coefficient 

of a coding unit 

Figure 38 illustrates pyramidal aligiunent based on MSE 
alignment. 

15 

Figxire 39 illustrates MSE ah'gnment of wavelet coefficients. 
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A method and apparatus for compression and decompression are 
described. In the following description, numeroiis details are set forA, 
such as types of delays, bit rates, types of filters, etc It \%'ill be apparent, 
5 however, to one sldDed in the art, that the present invention may be 
practiced without these specific details. In other instances, well-known 
structures and de\'ices are shown in block diagram fonn, rather than in 
detail, in order to avoid obscuring the present invention. 

Some portions of the detaDed descriptions which follow are 
1 0 presented in tenns of algorithms and symbolic representatior« of 

operations on data bits within a computer memory. These algorithmic 
descriptions and representations are the means used by those skilled in 
the data processing arts to most effecth'ely convey the substance of their 
work to others skiDed in the art An algoriflun is here, and ger\erally, 
1 5 conceived to be a self-consistent sequence of steps leading to a desired 
result. The steps are those requiring physical manipulations of physical 
quantities. Usually, though not necessarily, these quantities take the 
form of electrical or magnetic signals capable of being stored, transferred, 
combined, compared, and otherwise manipulated. It has proven 
convenient at times, principally for reasons of common usage, to refer to 
these signals as bits, values, elements, symbols, characters, terms, 

numbers, or the like. 

It should be borne in mind, however, that all of these and similar 



20 



15 

terms are to be associated with the appropriate physical quantities and are 
merely convenient labels applied to these quantities- UiJess specifically 
stated otherwise as apparent from the following discussions, it is 
appreciated that throughout the present inventiorv discussions utilizing 
5 terms such as •'processing" or "computing" or "calculating" or 
"determining" or "displaying" or the like, refer to the action and 
processes of a computer system, or similar electronic computing device, 
that manipulates and transforms data represented as physical (electronic) 
quantities within the computer system's registers and memories into 

1 0 other data similarly represented as physical quantities within the 
computer s}'stem memories or registers or other such information 
storage, traiOTiission or display devices. 

The present invention also relates to apparatus for performing the 
operations herein. This apparatus may be specially constructed for the 

1 5 required purposes, or it may comprise a general purpose computer 

selectively iactivated or reconfigured by a computer program stored in the 
computer. Sudi a computer program may be stored in a computer 
readable storage medium, such as, but is not lixnited to, any type of disk 
including floppy disks, optical disks, CD-ROMs, and magneto-optical 

20 disks, read-orily memories (ROMs), random access memories (RAMs), 
EPROMs, EEPROMs, magnet or optical cards, or any type of media 
suitable for storing electronic instructions, and eadi coupled to a 
computer system bus. The algorithms and displays presented herein are 



not inherently related to any particular computer or other apparatus. 
Variotis general purpose machines may be used with programs in 
accor^iance with the teachings herein, or it may prove convenient to 
construct more specialized apparatus to perform Ae required method 
steps- The required structure for a variety of these machines will appear 
from the description below. In addition, the present invention is not 
described with reference to any particular programming language. It will 
be appreciated that a variety of programming languages may be used to 
implement the teachings of the invention as descn'bed herein. 

The following terms are used in the description that follows. A 
defmition has been included for Aese various terms. However, the 
definition provided should not be considered limiting to the extent that 
the terms are known in the art. These deSnitions are provided to help in 
the understanding of the present invention. 

A method of parallel entropy coding using 
simple codes (e.g., run codes) for bit generation 
and probability estimation based on the 
codewords used (e.g., tabular probability 
estimation). In one embodiment, ABS coding 
also includes a method for mvdtiplexing and 
demtiltiplexing streams from several coders. 
The degree of shifting of the transfonft 



ABS coding: 



aligrunent: 



Arithmetic coding: 

5 

B-coding: 



10 

Binary entropy coder: 
binary-style: 

15 

binary-style context 
model: 

bit-significance: 

20 

child-based order: 



3 
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coefficients in a frequency band with respect to 

the other frequency bands. 

Shannon/EIias Coding with finite precision 

arithmetic, not necessarily a binary entropy 

coder. 

A binary entropy coder that uses a finite state 
machine for compression. Unlike Huffman 
coding, using the finite state machine does well 
with binary symbols, and is tiseful for a range 
of input probabilities. 

A noiseless coder which acts on binary (yes /no) 

decisions, often expressed as the most probable 

symbol (mps) and least probable symbol Ops). 

Coding style with edge-fiU Gray encoding of the 

pixels and a particular context model. 

A context model for bi-Ievel and limited-level 
image data. 

A number representation, similar to sign 
magnitude, with head bits, followed by the sign 
bit, followed by tail bits, if any. The embedding 
encodes in bit-plane order with respect to this 
representation. 

A scan order through a two dimensional 
image. It is simflar to raster order except that 



18 



3 



10 



coefficient: 
components: 



15 



20 



context model: 



the scan works on two by two blocks. Consider 
scanning a "parent" frequency band in raster 
order. Ead\ coefficient will have four duldrcn. 
These children are ordered from top-left^ top- 
right, bottom-left, and bottom-right followed by 
the next parent and the next set of four 
children and so on until the end of Ae line. 
Then processing returns to the next two lines 
and eventually ends in the lower right comer. 
No lines are skipped. Child-based order is also 
referred to as 2x2 block order. 
Components after the transform. 
Constituent parts of the image. The 
components make up the pixels. For example, 
the red, green, and blue bands are component 
bands. Each individual pixel is made up of a 
red, green, and blue component Components 
and component bands .can contain any type of 
information that has a spatial mapping to the 
image. 

Causally available information relative to the 
current bit to be coded that gives historically- 
learned information about the current bit. 
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enabling conditional probability estimation for 
entropy coding. In binary images^ a possible 
context for a pixel is the previous two pixels in 
the same row and three pixels from the 
previous row. 

decomposition level: Place in the wavelet decomposition pyramid. 

This is directly related to resolution, 
efficient transform: Transform that achieves the best energy 

compaction into the coefficients while using 

the minimtim number of bits to represent 

those coefficients. 

Embedded context model: A context model which separates the context 

bii\s and resxilts into levels of importance in 
such a way that effective lossy compression is 
obtained if the more important values are 
retained. 

Embedded with ordering: A special case of embedded context models 

where there is not an explidt labeling of 
importance, but rather the compressed data is 
ordered with the most important data in the 
front. 

embedded quantization: Quantization that is implied by Ae codestream. 

For example, if the importance levels are 



3 

entropy coder 

10 



15 



entry point 

20 
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placed in order, from the most important to 
the least, then quantization is performed by 
simple truncation of tfie codestream- The 
same functiorulity is available with tags, 
markers, pointers, or other signaling. Multiple 
quantizations can be performed on an image at 
decode, but oiUy one embedded quantization 
can be performed at encode time. 
A device that encodes or decodes a current bit 
based on a probability estimation- An entropy 
coder may also be referred to herein as a multi- 
context biruoy entropy coder. The context of flie 
current bit is some diosen configuration of 
"nearby" bits and allows probability estimation 
for the best representation of the current bit (or 
multiple bits). In one embodiment, an entropy 
coder may include a binary coder, a parallel 
run-length coder or a Hufftnan coder. 
A point in the coded data that starts with a 
known coding state. Hie decoder can start 
decoding at this point widiout decoding the 
pre\'ious data. In most cases, this requires that 
flie context and the binaij' entropy coder be 



fixed-length: 

5 

10 fixed-rate: 
15 

fixed-size: . 

20 

beqaency band: 
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reset into an irutial state. The coded data for 
each coding unit begins at an entry point 
A system that converts a specific block of data 
to a specific block of compressed data, e.g., BTC 
(block tnmcation coding) and some forms of 
VQ (vector quantization). Hxed-length codes 
serve fixed-rate and fixed-size applicatioris, but 
the rate-distortion performance is often poor 
compared with variable-rate systems. 
An application or system that maintains a 
certain pixel rate and has a limited bandvridth 
channel. In one embodiment, to attain this 
goal, local average compression is achieved 
rather than a global average compression. For 
example, MPEG requires a fixed-rate. 
An application or system that has a limited size 
biiffer. In one embodiment, to attain this goal, 
a global average compression is achieved, e.g., a 
print buffer. (An application can be fixed-rate, 
fixed-size, or both.) 

Each fiequency band describes a group of 
coefficients resulting from the same sequence 
of filtering operations. 
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Huffman Coder. 



importance levels: 



head bits: In bit-significance representation, the head bits 

are the magnitude bits from tiie most 
sigiuficant up to and including the first non- 
zero bit 

Generally, a fixed length code which produces 
an integral nximber of bits for each symbol 
The rmit of coded data which corresponds, 
before compression, to an entire bit-plane of 
the embedded data. The importance level 
includes all appropriate bit-planes from the 
different coefficient frequency bands. 
LPS (Uast Probable Symbol): The outcome in a binary decision with less 

than 50% probability. When the two outcomes 
are equally probable, it is unimportant which is 
designated mps or Ips as long as boA Ae 
encoder and decoder malce the same 
designation. 

Lossless/Noiseless/Reversible coding: Compressing data in a manner 

which allows perfect recor\struction of Ae 
original data. 

Lossy Coding: Coding of data wHch does not guarantee 

perfect reconstruction of the original data. The 
changes to the original data may be performed 
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in such a way as to not be visually 
objectionable or detectable. Often fixed rate is 
possible. 

MPS (Most Probable S>Tnbol): The outcome of a binary decision with 

more than 50% probability. 

overlapped transform: A transform where a single source sample 

point contributes to multiple coefficients of the 
same frequency. Examples include many 
wavelets and the Lapped Orthogorud 
Trarisform, 

parent coeffident The coefficient or pixel in the next higher 

pyramidal level that covers the same image 
space as the current coefficient or pixel. For 
example, the parent of the ISD coeffidents is 
the 2SD coeffidents which is the parent of the 
3SD coeffidents in Figure lA. 

Probability Estimation Machine /ModtJe: Part of a codirtg system which 

tracks the probability witfun a context 

progressive pixel depth: A codestream tiiat is ordered with deepening 

bit-planes of data at full image resolution. 

progressive pyramidal: Succession of resolutior\s where each lower 

resolution is a linear factor of ^*vo in each 
dimension (a factor of four in area). • 
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Q-Coder 



raster orden 



10 



reversible txansfonn: 



1 5 tail-bits (or tafl): 



tile data segment 
20 TS-transfonn: 



A binary arithmetic coder where additions 
Jiave been substituted ior multiplications and 
probabilities limited to discrete values and 
probability estimates are updated when bits are 
output 

A scan order through a two dimensional 
image. It starts in the upper left comer, moves 
left to right, then returns to the left side of the 
next line, finally ending in the lower right 
comer. No lines are skipped. 
In one embodiment, a reversible trai«fomi is 
an efficient transform implemented with 
integer arithmetic whose compressed results 
can be reconstructed into flie original. 
In bit-significance representation, the tail bits 
axe the magnirade bits with less significance 
than Ae most significant non-zero biL 
Portion of the codestream fuUx describing one 
coding unit 

Two-Sbc transfonn, a specific reversible 
wavelet filter pair with a 2-tap low pass 
analysis and a 6-tap high pass analysis filter. 
The syntfiesis filters are quadrature mirror of 



TT-transfonn: 



imified lossless/lossy: 



wavelet filters: 



wavelet transform: 



wavelet trees: 
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the analysis filters. 

Two-Ten transform, a specific reversible 
wavelet filter pair with a 2-tap low pass 
analysis and a 10-tap high pass analysis filter. 
The synthesis filters are quadratixre mirror of 
the arialysis filters. 

The same compression system provides a 
codestream capable of lossless or lossy 
reconstruction. In one embodiment of the 
present invention, this codestream is capable of 
both without settings or instructions to &e 
encoder. 

The high and low pass synthesis and analysis 

filters used in wavelet transform. 

A transformation with both "frequency" and 

"time (or ^ace)" domain constraints. In one 

embodiment, it is a transform comprising a 

high pass filter and a low pass filter. The 

resulting coefficients are decimated by two 

(critically filtered) and the filters are applied to 

the low pass coefficients. 

The coefficients, and the pixels, that are related 

to a single coefficient in the SS section- of the 
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highest level wavelet decomposition. The 
TOianber of coeffidents is a function of tiie 
number of levels. Figure lA illustrates the 
coeffidents induded in a wavelet tree. The 
span of a wavelet tree is dependent on the 
number of decomposition levels. For example, 
with one level of decomposition, a wavelet 
tree spans four pixels, with two levels it spans 
16, etc. Table 1 below illustrates the nximber of 
pixels affected by a wavdet tree for different 
levels. In two dimensions, eadi wavdet tree 
comprises three subtrees called SO, DD and DS. 



Table 1 





Width 


Height 


Total 


1 level 


2 


2 


4 


2 levels 


4 


4 


16 


3 levels 


8 


8 


. 64 ' 


4 levels 


16 


16 


256 


5 levels 


32 


32 


1024 


6 levels 


64 


64 


4096 



OvppnVw of t^p Prgsgnt Tnvention 

The present invention provides a compression/decompression 



27 

system having an encoding portion and a decoding portion. The 
encoding portion is responsible ior encoding input data to create 
compressed data, while the decoding portion is responsible for decoding 
previously encoded data to produce a reconstructed version of the 
5 original input data. The input data may comprise a variety of data types, 
such image (still or video), audio, etc. In one embodiment, the data is 
digital signal data; however, analog data digitized, text data formats, and 
other formats are possible. The source of the data may be a memory or 
channel for the encoding portion and /or die decoding portion. 

10 In the present invention, elements of the encoding portion, and /or 

the decoding portion may be implemented in hardware or software, such 
as that used on a computer system. The present invention provides a 
lossless compression/decompression systenu The present invention 
may also be configured to perform lossy comprcssion/decompressiorx. 

1 5 The system of the present invention employs fest lossy /lossless 

compression by reversible wavelets, which is described in greater detail 
below. The system may include a printer, such as, for example, a laser 
printer. In one embodiment, the printer uses an inexpensive hard disk 
to store a rendered page, greatly reducing the amotmt of expensive 

20 random access memory (RAM) required. Compression is used to match 
the limited bandwidth of Ae hard disk or other storage device to the 
greater bandwidth required by the print engine. The coding technology 
of the present invention meets the high speed, real-time requirements of 
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the print engine, while the present invention provides cither excellent 
lossless or lossy compression as required by image characteristics and .the 
"bursty iwture of the hard disk. 

The following detailed description sets forth a general overview of 
compression by reversible wavelets, a compressed frame store 
application, a color laser printer, and embodiments of a printer chip. The 
printer's rendering engine uses a hard disk for storage. Because the hard 
disk is slower than the print engine, compression is used to provide rate 
matching. Display list technotegy may also be used to decrease the 
memory required while rendering. A display-list based rendering engine 
allows the compression system to handle bands of the image 
independently. Note that alti^ough the present invention is described in 
terms of a printer system, the present invention is applicable to other 
systems that include compression and/or decompression subsystems as 

portions thereof. 

Also iflscussed herein is an embedded unified lossless /lossy 
compression system. The embedded characteristic of the system allows 
quality to be determined by the transfer rate of die disk. For easUy 
compressed images (e.g., most documents with text and/or line art), 
lossless compression is achieved. For difficult to compress images (eg., 
documents with noisy natural images and/oi halftones), high quaHty 
lossy compression is achieved. 

For a description of a s}'stem(s) fliat supports both lossless- 
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cosipression snd high qiality lossy coaipression cf color Imiges, see 
JP-A-1M4484 ,«uicntiflcd 
"Compression and Decompression vfith Wavelet Style and Binary Style 
Induding puantiation by DeWce-Dependent Parser" and U^. Patent 

.. . entitled TNlethod 

and Apparatus for Rcversible'Color Conversion". 



The present invention employs compression by reversible 
wavelets. 

t 

WcvcUt Decomposition 

The present Invention initiElIy perfonns decomposition of an 
iricge (in the fonn of image data) or anoflier data signal izsing reveisxble 
wavelets. In the present Invention^ £ reversible wavelet traiisfoim 
compri5es..an Implementation of an exact-reconstruction systea* in 
integer arithmetic such that a signal wifli integer coefficients can be 
losslessly recovered. An efficient reversible transform is one with 
transfonn matrix of detenninant equals 1 (or almost 1). ^ 

By using reversible wavelets, &e present invention is able to 
provide lossless compression with &ute precision e2i&meti& The 
rerjJts generated by appljing flie reversible wavelet trahsforjn to the 
image data are a series of coefficients. 
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Ti^ «versibk wavelet Hansfonn of *e present invention may he 
i„,p,e»,ented using . set of fflters. to one embodiment fte fl.«s Bxe . 
Tw<^.ep low-pass Htei »d a Six-lap Wgh-pass filter to implement . 
Inform referred to herein as the IS transform, or 2.6 transform. In 
5 ano*er en^cdiment. the filters are a 1Vo,Hp low-pass filter «>d a Ten^ 
«p high.pa« filter to implement a transform referred to herein as the TT 
transform, or WO transform. These filters may be implemented .sing 
only addition and subtraction operations (ph.s hardwire! bit shifdng). 
The TT-transform has at least one advantage and at least one 
,0 disadvantage with ,«peC to the TS-transfonn. One advantage is that it 
provides better compression that the TS-tra«for,ru The disadvantage of 
&e TT-transfonn is that the longer KKtap filter requires a higher 
hardwaic cost 



IS 



Two-Dlminsioiua mvtkt Dtccmpcsiticn 

using the low-pass »>d high-pass filters of the pres«>t invention, a 
„„lution decomposition is perfonned. The number of levels of 



^position is variabU and may be m number; how««. currently the 
Inber of decomposition levels equals from two to eight Uvels^T^ 
,0 m.ximumnumberoflevelsistheU.g,ofthemaximumoftheUng,hor 

Width of the input 

The most common way to perform the transform on tw<^ 
dimensional data, such as an image, is to apply the one-dimens»nal 
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filters separately, Le., along the rows and then along tf\e columns. The 
first level of decomposition leads to four different bands of coefficients, 
referred to herein as SS, DS, SD, and DD- The letters refer to the smooth 
(S) and detail (D) filters defined above, which correspond to low (L) and 
5 high (H) pass filters respectively. Hence, the SS band consist of 

coefficients from the smooth filter in both row and column directior\s. 

Each frequency subband in a wavelet decomposition can be further 
decomposed. The most common practice is to only decompose the SS 
frequency subband further, and may include further decomposing of the 

10 SS frequency subband in each decomposition level as each is generated. 
Such a multiple decomposition is referred to as a pyramidal 
decomposition. The designatior^s SS, SD, DS, DD and the decomposition 
level number denote each decomposition. 

Note that wiA cither the TS or TT transforms of tfie present 

15 invention, the pyramidal decomposition does not increase the coefficient 
size. 

If the reversible wavelet transform is recursively applied to an 
image, the first level of decomposition operates on the finest detail, or 
resolution. At a first decomposition level, the image is decomposed into 
20 four Slab-images (eg., subbands). Eadi subbai\d represents a band of 

spatial frequencies. The first level subbands arc designated ISS, ISD, IDS, 
and IDD. The process of decomposing the original image involves 
subsampling by two in both horizontal and vertical dimensions; sud\ 



10 



15 



20 
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that the first level subbands ISS, ISD, IDS and IDD each have one-fourth 
as many coeffidents as Ae inpufhas pixels (or coefficients) of fixe image. 

Subband ISS contains simultaneously low frequency horizontal 
and low frequency vertical information. Typically a large portion of the 
image energy is concentrated in this subband. Subband ISD contains low 
frequency horizontal and Hg^ frequency vertical information (e.g., 
horizontal edge information). Subband IDS contains high frequency 
horizontal information and low frequency vertical information (e.g., 
vertical edge information). Subband IDD contains high frequency 
horizontal information and high frequency vertical information (e.g., 
texture or diagonal edge information). 

Each of the succeeding second, third and fourth lower 
decomposition levels is produced by decomposing the low frequency SS 
subband of the preceding level TWs subband iSS of the first level is 
decomposed to produce subbands 2SS, 2SD, 2DS and 2DD of the moderate 
detail second level. Similarly, subband 2SS is decomposed to produce 
coarse detail subbands 3SS, 3SD, 3DS and 3DD of the third level Also, 
subband SS2 is decomposed to produce coarser detail subbands 4SS, 4SD, 
4DS and ^.DD of the third level Due to subsampling by two, each second 
level subband is onesixteenA flie size of ^ original image. Each 
sample (e.g., pbcel) at this level represents moderate detafl in the original 
nnage at the same location. Similarly, ea A third level subband is 1/64 
the size of the original image. Each pixel at dus level corresponds to 
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relatively coarse detail in the original image at the same location. Also, 
each fourth level subband is 1/256 the size of the original image. 

Since the decomposed images are physically smaller than the 
original image due to subsampling, the same memory used to store tiie 
5 original image can be used to store all of the decomposed subbands. In 
other vvr'ords, the original image and decomposed siobbands ISS and 2SS 
are discarded and are not stored in a three level decomposition. 

Although oiJy four subband decomposition levels are described, 
additional levels could be developed in accordance vrfth the 
10 requirements of a particular system. Also, with other transformations 
such as DCT or lij\early spaced subbands, different parent-child 
relationships may be defined. 

Note that pyramidal decomposition does not increase the 
coefficient size viA the wavelet filters of the present invention. 
15 In other embodiments, other subbands in addition to the SS may 

be decomposed also. 

Tree Structure of Wavelets 

There is a natural and useful tree structure to wavelet coefficients 
20 in a pyramidal decomposition. A result of the subband decomposition is 
a single SS ft-equency subband corresponding to the last level of 
decomposition. On the other hand, there are as many SD, DS, and DD 
bands as the number of levels. The tree structure defines the parent of a 
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coeffident in a frequency band to be a coefficient in a same frequency 
band at z lower resolution and related Id &e same spatial locality. 

In the present invention, each tree comprises the 55 coefficients 
and three subtrees, namely the DS, SD and DD subtrees- The processing 
5 of the present invention is typically performed on the three subtrees. 
The root of each tree is a pxirdy smooth coeffidenL For .? two* 
dimensional signal such as an image, there are three subtrees, each with 
four duldreru The tree hierarchically is not limited to two dimensional 
signals. For example, for a one dimensional signal, cadi subtree has one 
10 child J Higher dimensions follow from the one-dimensional and two- 
dimensional cases. 

The process of multi-resolution decomposition may be performed 
using a filtering sj'stem. For examples of a two-dimensional, two-level 
transform, a two-dimensional, t\\'o-level transform implemented using 
15 one-dimei^Ional exemplary filters, see GB-A-2,303,030 

entitled Tvlefliod and Apparatus For 
Compression Using Reversible Wavelet Transforms znd. an Embedded 
Codestream" an^ GB-A-2^03,031 ' 

* sntided ^Reversible Wavelet Transform and Embedded 
20 Codestream ManipuUtioa*. 



PeTforming ihe Forward ViavtUt TTansfom 

In the present invention, the wavdct tiai^sfonn is peifomed wi& 
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two 1-D operations, horizontal then vertical. In one embodiment, one 
piece of hardware perfonns the horizontal operation whfle another 
performs the vertical operations. 

The number of levels determine the number of iterations. In one 
embodiment, a foiir level decomposition is performed tasing the TT 
transform in both the horizontal and vertical directions. In another 
embodiment, a four level decomposition is performed using foiir TS- 
transforms instead. 

The traiisform of the present invention is extremely 
-computationally efficient In one embodiment, -the present invention 
orders the computations performed by the transform to reduce the 
amoimt of both on-chip and off-chip memory and bandwidth required. 

Computation Orders and Data Flow for the Transform 

As discussed above in Oie present invention, the basic imit for 
computing the transfonn is the wavelet tree. Assuming a four level 
transform, each wavelet tree is a 16x16 block of pixels. A 16x16 block of 
pixels (all four components for CMYK images) are input to the transform 
of the present invention, and all of tfie possible calculations to generate 
coefficients are performed. (The inverse is similar, a 16x16 block of 
coefficients for each component is input and all possible calculations are 
performed). Sirice the present invention employs an overlapped 
transform, information from previous, neighboring trees is stored and 
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used in calculations. The boundary between the current wavelet tree and 
fl\e previous, neighboring information is referred to herein as a seam. 
The information that is preserved aaoss a seam to perform the 
transform of the present invention is described in detail below. 

Ordering of Wavelet Trees 

The ordering of wavelet trees for computing the transform is 
important because, in certain applications (e.g., printing), coding uruts of 
the present invention have a large width and a small height In one 
embodiment, each coding unit contains 4096x256 pbcek. 

In the following discussion, each of the coding units contains 
4096x256 pixels. However, it should be noted that the ordering described 
below is applicable to coding units of other sizes. Rgure 2A iDustiates an 
order that is similar to raster order. This order is referred to herein as tiie 
long seam transform order. Referring to Rgure 2A, the thick lines 
indicate the amount of data that is preserved across seams, and is 
indicative of how much storage is required to compute the transform. 
This data is proportional to one wavelet tree for- the horiiontal 
transfonn, but to Ae width of Hie image (4096 in tius example) for the 
vertical transform. The amount of storage for this data may require the 
use of external memory. However, because of the closeness to raster 
order, during the inverse transform, data can be output from the 
transform (to, for instance, a printer in a printer application) as SDon as a 
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horizontal row of wavelet trees has to be converted to pixels. 

Figure 2B illustrates an alternative embodiment of an order, 
which is referred to herein as the short seam order. The storage for 
seams is proportional to the height of the coding imit (256 in this 
5 example) for the horizontal transform and one wavelet tree for the 

vertical transform. This greatly reduces tf^e amount of memory required, 
making on-chip storage practical. 

Figiire 2C shows an alternative short seam order. At the cost of 
storage proportional to or\e more wavelet tree, the number of 
10 consecutive pixels processed in raster order is increased. This altenxative 
or similar alternatives may allow for more efficient use of fast page mode 
or extended data out (EDO) RAM in the band buffer with little extra cost 
in seam memory. The efficient is gained by the fact that most memories 
are desired or optimized for accesses to adjacent memory locations. 
1 5 Therefore, any increase in the use of adjacent memory accesses due to the 
seam order results in more efficient memory iisage. 

Computation for One Wavelet Tree 

The following equations define botfi the TS-transform and the TT- 
20 trarisform. For an input x(n), tf\e output of the low pass filter, the 
smooth signal s(n), and the high pass filter, the detail signal d(n) are 
computed as shown in the equation below. 
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sin)> 



1 2 J 
^d{n) = x(2n) - x(2n + 1) + f (n) 

The inverse transfonn is shown in the equation below. 



5 where p(n) is computed hy: 



p(n) = J(n)-r(nX 



The TS-traitf form and the TT-txansfonn differ in the definition of t(n). 
10 For the TS-traittfonn. 



For the TT-transfonn, 



15 



r(n) = [— ^ T-J 

Note tiuit in the following discussion the notation [J means to 
round dovm or truncate and is sometimes referred to as the floor 
function. 



20 
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The TS'Transform 

Tiie effect of using the sSx tap fflter and a two tap fflter at even 
locations is that tiiree pieces of infonnation miist be stored. The six tap 
filter requires two delays. The two tap filter requires one delay so its 
5 result can be centered wifli respected to tfie six tap filter* s result 

Specifically, two s(*) values and one d(*) value or a partial resiilt from 
the d(«) calculation must be stored. Storage of these values is identical 
regardless of whether or not a particular filtering operation crosses a 
seam or not 

1 0 Figures 3A through 3H illustrate the resuh of each application of 

the TS-transform filter for a four level transform on a wavelet tree of the 
present inventioiu In these figures, the output of the low pass filter is 
denoted as "s" for smooth. Hie output of the high pass filter is denoted 
"d" for detail. The denotes an intermediate value used to compute a 

15 "d"; it is a x(2n)-x(2n+l) value. The 'B" values are used during the 

forward transform; for the inverse trans.^rm, a "d" value that is not used 
in any computations is stored in its place. The notation "sd" indicates 
that a coefficient is the result of first a horizontal low pass filter and then 
a vertical high pass filter. The meanings of "ds", "dd*, "ss", "dB" and 

20 * "sB" are similar. "Die bold square corresponds to the 256 input pixels. 
The shaded "s", "ds" and "ss" values are computed with a previous 
wavelet tree and stored for use in the current vtravelet tree. 

For the forward transform, the inputs to levels 2, 3 and 4 of the 
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transform are the "ss" coefficients from the.previous level The "sd", 
-ds" and ^dd" coefficients are finished, so they canTje output "when 
computed. The inverse transform does aU the computations in reverse 
order with respect to level (the 4th level first, then, 3, 2, and finally 1), 
and vertical (first) and horizontal (second). Within a pass of Ae 
transform, the data flow of the forward and inverse are identical, just the 
computation is different. 

TS-Transform Hardvare 

Figure 4A is a block diagram of one embodiment of a 
forward/inverse filter unit for use in implementing the one 
dimensional filters. Only memory and computational units are shown, 
hardwired shifts are not shown. Referring to Figure 4A, filter unit 4000 
handles both the forward and inverse transform. Alternate 
embodiments may use separate units for tiie forward and inverse 
transforms. For the forward transform, fl« size 'n" inputs are used, and 
the "s" and "d" outputs are generated. For Ae inverse transform, the "s" 
and "d" inputs are used and the oAer outputs are generated. 

Adder 4001 is coupled to recdve the n bit inputs and add them 
together to produce an output of x(2n+2)4x(2n*3). Adder 4002 subtracts 
one n bit input from the other and outputs a quantity of x(2n+2).xC2n+3). 
The outputs of adders 4001 and 4002 are coupled to one input of muxes 
4003 and 4004 respectively. The oAer input of to muxes 4003 and 4004 
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are coupled to receive the s and d inputs respectively. In one 
embodiment, the s input is n bits, while the d input is greater than n bits. 

The output of muxes 4003 and 4004 is controlled by a 
forward /inverse control signal indicative of whether the filter is in tt\e 

5 forward or inverse mode. In either the forward or inverse mode, the 
output of mux 4003 is equal to s(n+l). On the other hand, flie output of 
mvx 4004 is equal to p(n+l) in the forward mode and d(n+l) in flie 
inverse mode. The outputs of mux 4003 and 4004 along with a feedback 
of s(n) output from mux 4006 are coupled to flie inputs of register file 

10 4005. Register file 4005 contains the entries for each' component for the 
length of one wavelet tree. The data typically passes through register file 
4005. Based on the spatial location, the inputs to register 4005 are delayed 
to the output. An address input controk the outputs of register file 4005. 
In one embodiment, register file 4005 comprises two banks of memory 

1 5 with one port per bank and is used in a ping-pong style accesses back and 
forth between the two banks of memory. 

The output of mux 4003 is also the s output of the filter unit 
The outputs of register file 4005 are coupled to inputs of mux 4006 
along wth externally buffered data at seam buffer in 4020. The output 

20 4006A comprises the s(n-l) which is a twice delayed version of the output 
of mux 4003. The output 4006B comprises s(n) which is a delayed version 
of s(n+l). The output 4006C comprises p(n) for fi« forward mode and 
d(n) for the inverse mode. Mux 4006 is also controlled to provide seam 



42 

data to be externally buffered at seam buffer out 4021. 

The ou^ut of 4006C is coupled to one iz^ut of adders 4008 and 
4009. The other input of adders 4008 and 4009 is the output of mux 4015. 
Mux 4015 handles boundary conditions. On a boundary, mux 4015 
outputs as zero that is hardwired to one of its inputs. The hardwired 
zero may be changed to use otfier values in some embodiments. In a 
non-boundary condition, mux 4015 outputs t(n) which is output from 
adder 4007 which is coupled to add s(n+l) on one input to s(n-l) on 
another input by subtracting s(n-l) from s(n+l). 

Adder 4008 adds flie output 4006C of mux 4006 to flie output of 
m\ix 4015 to generate the d ou^ut of the filter unit. 

Adder 4009 subtracts the output of 4006C of mux 4006 from the 
output of mux 4015. The output of adder 4009 is added to s(n) on output 
4006B of mux 4O06 by adder 4010 to generate an n bit output of the filter 
tmit. The output of 4009 is also subtracted from s(n) of output 4006B of 
mux 4006 by adder 4011, which outputs the other n bit output of the fflter 
unit in the inverse direction. 

For seams longer than one wavelet tree, seam data may be stored 
in on-chip static RAM (SRAM) or external memory instead of in register 
file 4005. Mux 4006 provides access to and from this additional seam 
memory. 

Most of the hardware cost of filter unit 4000 is due to register file 
4005. The total amount of memory required is dependent of the* number 
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of filter units. In one embodiment, a total of 60 locations for storing 
three values <s, 5, d or ss/ ss, sd) is required. When more filter units are 
used, the memory required for each is less. Therefore, the hardware cost 
of using mxiltiple filter uruts is low. 

A fast inverse transform allows less latency between the end of 
decoding and the start of the data ou^ut operatiorv such as printing. 
This reduces the workspace memory required for decompression and 
allows larger coding uiuts. A fest forward transfonn allows the filter to 
handle bursts of data when more bandwidth is available, which, in turn, 
allows the transform to supply more data to the context model when a 
look-ahead aUows the context model to processes data quickly. If the 
forvt^ard transform cannot keep up with context model during encoding, 
disk bandwidth during encoding is wasted, delaying the time to start 
printing. Also, the control and dataflow may be simplified by having 
multiple filters. 

Figure 4B is a block diagram of one embodiment of a first level 
forward transform according to the present invention. Referring to 
Figure 4B, two filter imits 401 and 402, such as those described in Figiire 
4A, perform the first level of the transform. Filter unit 401 performs a 
level 1 horizontal transform, while filter unit 402 performs a level 1 
vertical transform. In one embodiment, the first level of the transform 
operates on 2x2 blocks of input Four registers 403-406 operate as delay 
units to delay outputs of filter imit 401. This is referred to as chihJ-based 
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order. Register 403 receives the S output of filter unit 401, whfle registers 
404 Slid 405Tecdve the d output The output of register 404 is coupled to 
the iitput of register 406. The outputs of registers 403 and 406 are coupled 
to inputs of mux 407, while the s output of the filter unit 401 and the 
5 output of register 405 are coupled to the inputs of mux 408. Two muxes 
407 and 408 select inputs for filter unit 402 from titose of the delayed 
^ coefficients output from filter unit 401. 

Filter unit 401 operates consecutively on two vertically adjacent 
pairs of inputs. This creates four coefficients that can, with the proper 
10 delay pro^ided by registers 403^06 for each component, be input to filter 
unit 402. Three of the four results can be output immediately, the "ss" 
-output is processed further. 

The first level forward transfer operates on groups of four pixek 
which are in 2x2 groupings. For the purposes of discussion, the first row 
15 should contain pixels a and b whfle the second row contains pixels c and 
d. The operation of the first level 4 transfonn in Figure 4B is as follows. 
During the first cyde, the horizontal transform is applied to a and b pbcels 
which are processed by filter unit 401. Filter unit 401 genejates the Sab 
which is stored in register 403 and Dab which is stored in registers 404 
20 and405. In the next cyde, pixels c and d are processed by filter unit 401 to 
perform the horizontal transfonn. The results of applying filter unit 
4001 is to generate Scd wHdi is stored in register 403 and Dcd whidx is 
stored in registers 404 and 405. At this cyde, the Sab from register 403 and 
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the Scd from register 405 are processed by filter unit 402 which performs a 
vertical pass of Ac tnmsform ai\d generates SS and SD- Also^ during fl\e 
second cycle, the value Dab moves from register 404 to register 406. In 
the next cycle, the value Dab from register 406 and Dcd fr^nrx 405 are 
5 processed by filter imit 402, which generates the outputs of DS and DD. 
In the same cycle, filter imit 401 process the a and b pixels from the next 
2x2 block. 

Figure 5 is a block diagram of one embodiment of a forward 
transform according to the present invention. Referring to Figure 5, 

10 level 1 transform 502 performs the level 1 transform. In one 

embodiment, level 1 transform comprises the level 1 transform of Figure 
4B. Filter imit 505 handles levels 2, 3 and 4 of the transform. A memory 
503 stores "ss" coefficients imtil sufficient coefficients are available to 
perform the transform. The number of coefficients which need to be 

1 5 stored is shown in Table 2 below. (Each location stores a coefficient for 
each component). 



Table 2 - "ss* delay memory 



between levels 


memory needed 


land2 
2 and 3 
3and4 


9 locations 
8 locations 
4 locations 



Order imit 504 multiplexes Ihe proper inputs into filter unit 505. 
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Input buffer 501 and output buffer 506 may be required to match between 
the transfer order required by the transform and tf»e order required by the 
band buffer or context model. 

For the inverse transform, the dataflow is reversed with the level 
5 4 inverse transform being performed followed by the level 3, level 2 and 
level 1 transforms in order. The output of the level 2 transform is fed 
into the first level transform hardware of level 1 transform 502. Also, 
vertical filtering is performed before horizontal filtering. Because of the 
horizontal and vertical filtering is identical except that one direction 
1 0 requires access to additional memory for scams, reversing the dataflow 
can be performed with a small amount of multiplexing. Before the 
inverse transform, the two byte coefficients need to be converted from 
the embedded form with two signaling bits into normal two's 
complement numbers. 
1 5 The elements described in Figures 4B and 5 may also be used for 

the TT-Transforms as wdL 

Transform Timing 

The transform timing of the forward transform of Figure 5 is based 
20 on the timing of the individual filter units. The first filter unit, filter 
unit 401, computes horizontal level 1 transforms, while the second filter 
unit, filter unit 402, computes vertical level 1 transforms. The third filter 
unit', filter unit 505, computes transforms for levels 2 through 4 6r is idle. 



0 
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In one embodiment, the third filter imit (505), when not idle, 
computes horizontal transforms during even clock cycles and vertical 
transforms during odd clock cycles. The timing for the inverse transform 
is similar (but reversed). 
5 In the follovdng example, 2x2 blocks within a wavelet tree are 

processed in the transpose of raster order. Note that less input/output 
(I/O) buffering might be required to support fast page mode/extended 
data out (EDO) DRAM if 2x2 blocks within a wavelet tree are processed in 
raster order instead. 
10 Figure 6 is a timing diagram of when coefficients are output The 

following timing is for each pixel. There are four components per pixeL 

starting at time 0 do: 

for (x+0pc<16/2,-x++) 
1 5 for (y=0,y<16,y++) 

apply level 1 horizontal filter at x,y 

starting at time 1 do: 

for (x=0,-x<16/2pc++)\ 
20 • for (y=0,7<16/2,y++) 

for (xx5=-lpoc<lppc++) /• O=smooth, -l=prcvious 

detaU V 

apply level 1 vertical filter at 2*x+xx,y 

25 for (x=0p«8/2pc++) 

starting at time 18+x*32, at even times do: 
for (y=0,7<8,y++) 

apply level 2 horizontal filter at x,y 

30 for (x=0,-x<8/2pc++) 

starting at time 21+x*32, at odd times do: 
for (y=0,7<8/2,7++) 

for (xx=-l,-xx<l,"xx-H-) /• O=smooth, -l=previous 
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detail*/ 

apply level 2 vertical £ltcr at 2»jc+xx,y 

for (x=0pc<4/2pc++) 

starting at time 664x*64 at even times do: 
for (y=0,7<4,7++) 

apply level 3 horizontal filter at x,y 

for (x=0pc<4/2pc-H-) 

starting at time 69+x'64, at odd times do: 
for (y=0,7<4/2,7++) 

for (xx=-l?oc<lpo(++) /• O=smooth, -l=previoiis 

detail •/ . , ^, . 

apply level 3 vertical filter at 2*x+xx,y 

at time 138 

apply level 4 horizontal filter at 0^ 

at time 140 

apply level 4 horizontal filter at 0,1 

at time 141 . v - / 

apply level 4 vertical filter at 0,0 /* smooth •/ 

at time 143 

apply level 4 vertical filter at -1,0 /• previous detail / 
TT-transform 

Figures 7A-7H show fiie results (outputs) of each one dimensional 
filtering operation of the TT transform. A rectangle indicates coefficients 
in a single wavelet tree that corresponds to Ae input pixels cunenUy 
being processed, shading indicates coefficients that are stored from the 
previous tree. Values labeled are intermediate results that are stored 
(and are the different between adjacent samples). The TT-transfbrm is 
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similar to the TS-transfonn, but reqtiires more storage. 

Pigure 8 is a Mode diagram of a 10 tap forward/inverse filter tmlL 
Note that hardwired shifts and rounding offsets are not shown to avoid 
obscuring the present invention. Note that mux 806 in Figure 8 can also 
5 be used for mirroring at transform boimdaries- For one implementation 
of mirroring, zeroing the "d" input and multiplexing the s(n+2) input of 
the overlap unit is also required. 

Refemng to Figure 8, adders 801 and 802 arc coupled to receive the 
2 n bit inputs during the forward pass of the filter rmiL Adder 801 adds 

10 the 2 n bit inputs and outputs a value coupled to one input of mux 803. 
Adder 802 subtracts one input from tf\e other, generating its output to 
one input of mux 804. Muxes 803 and 804 are also coupled to receive flie 
s and d inputs respectively for the inverse mode operation of the filter 
imiL The outputs of mux 803 is an n bit input equal to s(n+2), while the 

1 5 output of mux 804 is an n+1 bit input that is p(n+2) for the forward pass 
and d(n+2) for the inverse pass. 

Both outputs of m\ixes 803 and 804 are coupled to inputs of 
memory 803. Also coupled to inputs of memory 805 arc the ou^uts 
806A and 806D-F output from mux 806. Memory 805 delays the inputs to 

20 its outputs based on spatial locatioru In or^e embodiment, memory 805 
comprises a register file or an SRAM which is operated in a ping pong 
fashion with two banks and one port per bank An address is coupled to 
an input of memory to control the outputs whidi are generated -to mux 
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806. In one embodiment, the address stores 16 or 28 locations per 
component 

The outputs of memory 805 are coupled to inputs of mux 806 
along with external hvHer data received from the seam buffer in 820. The 
output 806A of mux 806 comprises s(n+l), which is a once delayed 
version of 5(n+2) at the output from mux 803. The output b06B of mux 
806 comprises s(n), which is a twice delayed version of the output of mux 
803. The output 806C of mux 806 comprises p(n) for Oie forward pass, 
which is a twice delayed version of the output of mux 806 and d(n) in the 
inverse pass, which is a twice delayed version of the ou^ut of mux 804. 
The output 806D comprises s(n-2), which is a four times delayed version 
of the output of mux 803. Hie output 806E of mux 806 comprises s(n-l), 
which is three times delayed of output of mux 803. LasUy, the output 
806F comprises p(n+l) in the forward pass, which is a once delayed 
version of the output of mux 804, and d(n+l) for Ae inverse pass, which 
is a once delayed version of flxe ou^ut of mux 804. 

Overlap imit 807 is coupled to receive Ae output of mux 803 along 
with the outputs 806A, D and E from mux 806. In respoijse to these 
inputs, overlap unit 807 generates t(n). One embodiment of the overlap 
imit is described in Hgure 9. 

The output of overlap unit 807, t(n), is coupled to one input of 
adders 808 and 809. Adder 808 adds t(n) to the output 806C of mux 806 to 
generate the D output of the filter unit Adder 809 subtracts tfie output 
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806C of mux 806 from t(n). The oulput of adder 809 is coupled to an 
input of each of adders 810 and 811. Adder 810 adds the output of adder 
809 to the output 806B of mux 806 to produce one of the n bit outputs of 
the filter when operating as an inverse filter imit. Adder 811 subtracts 
5 the output of adder 809 from the output 806B of mux 806 to generate the 
other output of the filter imit when operating as an inverse filter. 

Figure 9 is a block diagram of one embodiment of the overlap unit 
for the forward/inverse filter of Figure 8. Refeiiing to Figure 9, the 
overlap imit comprises adderis 901-906, multipliers 907-909 and divider 

1 0 910. Multipliers and dividers may be hardwired shifts. 

The overlap unit of Figxire 9 computes t(n) for the TT transform 
described above. Referring to Figure 9, adder 901 is coupled to receive the 
s(n+2) input and subtract it from the s(n-2) input and generates an output 
which is coupled to one input of adder 903. Adder 902 is coupled to 

1 5 receive the s(n-l) input and subtract from it Ae s(n-f 1) input. The output 
of adder 902 is coupled to the input of mtiltiplier 907 and multiplier 908. 
Multiplier 907 multiplies its input by two. In one embodiment the 
multiplication is performed by shifting the bits of the input to the left 
one position. The output of multiplier 907 is coupled to the other input 

20 of adder 903. 

Multiplier 908 multiplies the output of adder 902 by sixteen. In 
one embodiment, the multipMcation is performed by shifting the bits that 
are output from adder 902 to the left four bit positions. The output of 
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multiplexer 908 is coupled to one ir^ut of adder 905. The output of adder 
903 is coupled to one input of zdder 904 and also to the iirput of 

multiplexer 909. 

MultipUer 909 multiplies the output of adder 903 by two. In one 
5 embodiment, this multiplication is performed by shifting the bits that are 
output from adder 903 to the left one bit position. The output of 
multiplier 909 is coupled to the other input of adder 904. The output of 
adder 904 is coupled to the other input of adder 905. The output of adder 
905 is coupled to an input of adder 906 which adds it to 32, which is a 
to hardvsired input. The output of adder 906 is coupled to the input of the 
di\'ider 910. The divider 910 divides the input by 64. In one 
embodiment, Ais division is accomplished by shifting the bits of the 
input to the right six bit positions. The output of dhdder 910 comprises 
the t{n) output Note also that Hgure 9 shows each of the outputs with 
15 the current value on flie lines. 

Note that in both the reversible TS-transform and TT transform, 
like the S-transform, the low-pass filter is implemented so that the range 
of the input signal x(n) is tiie same as the output signal s(n). That is, 
there is no growA in the smooA output If Ae ii^ut signal is b bits deep, 
20 then the smooth output is also b bits. For example, if the signal is an 8-bit 
image, the output of Ae low-pass filter is also 8 bits. This is an important 
property for a pyramidal system where the smooth output is 
decompressed further by, for example, successively applying thelow-pass 
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filter. In prior art systems, the range of the output signal is greater than 
that of the input signal, thereby maldng successive applications of the 
filter difficult Also, there is no systemic error due to roimding in the 
integer implementation of the transform, so all error in a lossy system 
can be controlled by quantization. In addition, the low-pass filter has 
only two taps which makes it a non-overlapping filter. This property is 
important for the hardware implementation. 

Embedded Ordering 

In the present invention, the coefficients generated as a result of 
the wavelet decomposition are entropy coded. In the present invention, 
the coefficients iiutially tmdergo embedded ordering in which the 
coeffidents are ordered in a visually significant order or^ more generaUy, 
ordered with respect to some error metric (e.g., distortion metric). Error 
or distortion metrics include, for example, peak error and mean squared 
error (MSE). Additionally, ordering can be performed to give preference 
to bit-significance spatial location, relevance for database querying, and 
directionality (vertical, horizontal, diagonal, etc). 

The ordering of the data is performed to create the embedded 
quantization of the codes tream. In Ac present invention, two ordering 
s)^tems are used: a first for ordering ihe coefficients and a second for 
ordering the binary values vrfthin a coeffidenL The ordering of the 
present invention produces a bitstream that is thereafter coded with a 



binary entropy coder. 
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Bit-Significance Representation 

^ Most transform coeffidents are signed numbers even when ti« 
5 original components arc ui«igned (any coefficients output from at least 
one detail filter are signed). In one embodiment, tfie embedded order 
used for binary values wiflun a coefficient is by bit-plane. The coefficients 
are expressed in bit-significance representation prior to coding. Bit- 
significance is a sign-magnitude representation where the sign bit, rather 

1 0 than being the most significant bit (MSB), is encoded with the first non- 
zero magnihide bit. That is, the sign bit foDows the first non-zero 
magnitude bit rather than preceding all of the magnitude bits. Also, the 
sign bit is considered to be in the same bit-plane as tite most significant 
non-zero magnitude bit 

1 5 Bit-significance format represents a number vising fluee sets of bits: 

head, tail, and sign. The head bits are all the zero bits from the MSB up to 
and including the first non-zero magnitude bit The bit-plane in which 
the first non-zero magnitude bit occurs defines the significance of the 
coefficient The set of t^l bits comprises flie magnitude bits after the first 

20 non-zero magnitude bit to the LSB. The sign bit simply denotes the sign, 
where a 0 may represent a positive sign and 1 may represent a negative 
sign. A number, such as ± 2", with a non-zero bit as the MSB has only 
one head bit A zero coefficient has no tail or sign bits. Table 3 rfio\\'S all 
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possible values for form bit coeffidents ranging from -7 to 8- 



Table 3 Bit Significance Representation for 4 Bit Values 



Decimal 


X 5 

Complement 


Sign 
Magnitude 


oit-digruiicance 


-8 


4 Ann 

1000 






-7 


^ nm 

1001 


nil 


11 1 1 


-o 


mm 
lOlU 


1110 


11 1 0 


-5 


1011 


1101 


11 0 1 


-4 


1100 


1100 


11 0 0 


-3 


1101 


1011 


0 11 1 


-2 


1110 


1010 


0 11 0 


-1 


nil 


1001 


0 0 11 


0 


0000 


0000 


■ ODD' 


1 


0001 


0001 


0 0 10 


2 


0010 


0010 


0 10 0 


3 


0011 


0011 


0 10 1 


4 


0100 


0100 


10 0 0 


5 


0101 


0101 


10 0 1 


6 


0110 


0110 


10 1 0 


7 


0111 


0111 


10 1 1 



In Table 3, the bit significance representation shovm in eadi 



column includes one or two bits. In the case of two bits, the first bit is the 
5 first one bit and is followed by the sign bit 

In the case where the values are non-negative integers, sudi as 
occurs ViTith respect to the intensity of pixels, the order that may be used is 
the bitplane order (e.g., from the most significant to the least significant 
bitplane). In embodiments where twp's complement negative integers 
1 0 are also allowed, the embedded order of the sign bit is the same as the 
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first non-zero bit of the absolute value of the integer. Therefore, the sign 
bit is not considered "until a non-zero bit is coded. For example, using 
sign magnitude notation, the 16-bit number -7 is: 

1000000000000111 

5 On a bit-plane basis, the first twelve decisions will be "insignificant" or 
zero. The first 1-bit occurs at the flurteentfi decision. Next, the sign bit 
f negative") will be coded. After the sign bit is coded, the tail bits are 
processed. The fifteenth and sbcteenth decisions are both "1". 

Since the coefficients are coded from most significant bitplane to 

1 0 least significant biplane, the number of bitplanes in the data must be 

determined. In the present invention, this is accomplished by finding an 
upper bound on the magnitudes of the coefficient values calculated from 
the data or derived from the depth of Ae image and flie filter coefficients. 
For example, if the upper bound is 149, dien Aere are 8 bits of 

15 significance or 8 bitplanes. For speed in software, bitplane coding may 
not be us«d. In an alternate embodiment, a bitplane is coded only when a 
coefficient becomes significant as a bii\ary number. 

Coefficient Alignment 
20 The present invention aligns coeffidents with respect to each other 

before the bit-plane encoding. This is because the coeffidents in the 
different frequency subbands represent different frequendes similar to 
the FFT or the DCT. By aligning coeffidents, the present invcnfion 
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controls guantizatioiu The less heavily quantized coefficients will be 
aligned toward Ae eailier bit-planes (e.g., shifted to the left). Thus, if flie 
stream is truncated, these coefficients will have more bits defining them 
than the more heavily quantized coefficients. 

In one embodiment, the coefficients are aligned for the best rate- 
distortion performance in terms of SNR or MSE. There are many 
possible alignments including one that is near-optimal in terms of 
statistical error metrics such as MSE. Alternately, the alignment could 
allow a physchovisual quantization of the coefficient data. The 
alignment has significant impact on the evolution of the image quality 
(or in other words on the rate-distortion curve), but has negligible impact 
on the final compression ratio of the lossless system. Other alignments 
could correspond to specific coefficient quantization. Region of Interest 
fidelity encoding, or resolution progressive alignment 

The alignment may be signaled in tf^ header of the compressed 
data or it may be fixed for a particular application or it may be fixed for a 
particular application (i.e., the system only has one alignment). The 
alignment of the different sized coefficients is known to both the coder 
and decoder and has no impact on the entropy coder efficiency. 

The bit depths of flte various coefficients in a two-level TS- 
transfonn and TT-transform decomposition from an input image witii b 
bits per pixel are shown in Figure 11. Rgure 12 is one embodiment of the 
multipliers for the frequency band used for coefficient alignmieni in the 
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present mvention. To align Ae coefficients, the 1-DD coefficient size is 
iisedasaTeference,2ndduftsaiegiven\vithTespecttoAissi2C. Ashift 

of n is a multipUcation by 2" 

In one embodiment, the coeffidents are shifted with respect to the 

5 magnitude of the largest coefficient to create an aHgmnent of all the 
coeffidents in the image. The aligned coeffidents are then handled in 
bit-planes called importance levels, from the most significant importance 
level to the least significant importance level The sign is encoded with 
the last head bit of each coeffident The sign bit is in whatever 

1 0 importance level the last head bit is in. It is important to note that the 

alignment simply controls the order the bits are sent to the entropy coder. 
Actual padding, shifting, storage, or coding of extra zero bits is not 
performed. 

Table 4 iDustrates one embodiment of alignment numbers for 
1 5 aligning coeffidents. 





Table 4 


- Coeffident Alignment 






l-DD 


1-DS,1-SD 2-DD 


2-DS>SD 3-DD 3-DS,3-SD 


4-DD 


4-DS.4-SD 


reference 


Tj.ftl Leftl 


Left2 Uft2 ' Left3 


Left3 


Left 4 



The alignment of different sized coeffidents is known to boA the 
20 coder and the decoder and has no impact on the entropy coder effidency. 
Note that coding units of tite same data set may have different 
alignments. 
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Ordering of the Codestream and Ihe Coniext Modfi 

Figure 10 illustrates the ordering of the codestream and the ordering 
within a coding unit. Referring to Figure 10, the header 1001 is followed by 
5 the coding units 1002 in order from top band to bottom. (The header 1001 is 
optioniLl in applications designed for a single image type.) Each coding xmit 
includes most important data 1003, less important data 1004, and least 
important data 1005. 

The context model determines both the order in which data is 

1 0 coded and the conditioning used for specific bits of Ae data. Ordering 
will be considered first. The highest level ordering of tfie data has 
already been described above. The data is divided into "most important 
data", referred to interchangeably herein as the most important chimk 
(MIC), which is coded losslessly in transform order and 'less important 

1 5 data" which is referred to interdiangeably herein as the least important 
chimlc (UC) and is coded in an embedded unified lossless /lossy manner. 

The order that the coefficients during eadi bit*plai\e are processed are 
from the low resolution to the high resolution (from low frequency to the 
high frequenc}*). The coefficient subband coder within each bit-plane is from 

20 the high level Qow resolutiorv low frequency) to the low level (high 

resolution, high frequency). VA^thin each frequency subband, the coding is 
in a defined order. In one embodiment, the order may be raster order, 2x2 
block order, serpentine order, Peano scan order, etc 
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In the case of a four level decomposition using the codestream of 
Figure 3, the order is as follows: 

4-SS, 4.DS, 4-SD, 4.DD, 3-DS, 3-DD, 2-DS, 2-SD. 2-DD. 1-DS, 1-SD, 1-DD 

5 

One embodiment of the context model used in the present 
invention is descn^ed below. This model uses bits within a coding unit 
based on the spatial and spectral dependencies of the coefficients. The 
available binary values of the neighboring coefficients and parent 
10 coefficients can be used to create contexts. The contexts, however, are 
causal for decodability and in small numbers for efficient adaptation. 

The present invention provides a context model to model tiie 
bitstream created by the coefficients in the embedded bit-significance 
order for the binary entropy coder. 
1 5 Figure 37 shows the neighborhood coefficients for every coefficient 

of a coding unit. Referring to Figure 37, the neighborhood coefficients are 
denoted with the obvious geographical notations (e.g., N=north, 
KE=northeast, etc). Given a coeffident, such as P in Figure 37, and a 
current bit-plane, the context model can use any information from all of 
20 the coding unit prior to the given bit-plane. The parent coeffident of fl^e 
present coeffident is also used for this context model 

The head bits are the most compressible data. Therefore, a large 
amount of context, or conditioning, is used to enhance compression. 
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Rather than using the neighborhood or parent coefficient values to 
determir^e the context for the present bit of the present coefficient, the 
information is reduced to two signaling bits described in conjxmction 
with Figure 13A. This information can be stored in memory or 
calculated dynamically from the neighbor or parent coefficient 

Implementing Embedding for Storage to Disk 

One embodiment of the embedding scheme for the present 
invention is based on the fact that when starting to encode data, the 
entire band buffer memory is full of data, such that there is no extra space 
available in the band for use as workspace memory. The present 
invention writes some of the less important data to memory to be 
embedded later. In the present invention, the data that is to be embedded 
is stored in memory and Ais is the less important data. The more 
important data is encoded directly. The least important data comprises 
some nurnber of the least sigr\ificant bits. 

In one embodiment, if a portion of each coefficient is written back 
to memory for encoding later, the head and tail bits must be known as 
well as whether the sign bit has been done in order to ensure proper 
encoding. In one embodiment, two or more sigr\aling bits (e.g., 3, 4^ 5, 
etc.) are used to indicate the head, tail and sign bit informatioru 

In one embodiment, where S-bit memory locations are used, two 
signalLng bits indicate the head, tail and sign bit information. The use of 
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two signaling bits allows tiie least important 6 importance levels to be 
■written back to memory with Ae two signaling bits. One rignal bit 
indicates whether the most significant bit of the 6 Importance levek is a 
head or tail bit. If the first signaling bit indicates that it is a head bit, then 
5 the second signaling bit is the sign for the coefficient. On the other hand, 
if the first signaling bit indicates that the most significant bit of the data 
written back to memory is a tail bit, Aen the second signaling bit is a free 
signaling bit which can indicate additional tail information, such as, for 
example, whether the most important tail bit is the first tail bit or a later 
10 tail bit. 

Figure 13A shows a coefficient divided into most important data 
1301, referred to as the MIC, and less important data 1302, referred to as 
the UC. In one embodiment, the MIC comprises the 6 higher order bits 
of each coefficient, whfle the UC comprises the 6 lower order bits. Most 

1 5 important data 1301 is sent to the context model to be coded immediately 
in coefficient order. No buffering in external memory is necessary for 
this data. Less important data 1302 is written to memoiy (e.g., RAM) to 
be coded later and embedded by order. In additi9n, the two signaling bits 
in the data written to memoiy. Signaling bit 1303 indicates whether Ae 

20 most significant bit in the data written to memory is a head bit Signaling 
bit 1304 gives the sign for the coefficient or indicates if the first tail bit is 
contained in the data or not Note that the signaling bits may be stored in 
a concatenated fashion with less important data 1302 or may be stored in 
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0 



10 



another memory or memory location that is assodated with the memory 
storing less important data 1302 so that the signaling bits associated with 
each portion of a coefficient may be identified. 

Examples in Table 5 show the use of the two signaling bits. The 
columns of the body of Table 5 are intended to line up with the data types 
in Figure 13A. Sign bits are denoted with "S", tafl bits are denoted with 
T"/ do not care bits are denoted with 'x*, the value of the tail-on bit is 
denoted with ^h" or ''t". In Table 5, h=0 and t=l for the signaling bits. In 
an alternative embodiment, the conventions may be reversed. In one 
embodiment, a sign bit in Table 5 of 0 indicates a positive sigrv while a 
sign bit in Table 5 of 1 indicates a negative sign. An opposite assignment 
may be used. Note the sign bit is always kept with the first ''on* bit, so it 
can be coded at the same time for embedding. 
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Tables 



magrutude 



most important 
Qossless) 



less important signaling bits 
(bitplane embedded) 



Ixxxx 
Ixxxxx 
Ixxxxxx 
Ixxxxxxx 

lXXX)0000C 
15O0OOO000C 



X 
X 

s 
s 
s 

S 



0000000 
0000000 
0000001 
DOOOOIT 
OOOOITT 
OOOITTT 



Oil ill 
11 11 IT 
11 11 IT 
TTTTTT 
1111 IT 
TTTTTT 



h s 
h s 
t 0 
t 1 
t 1 
t 1 
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In Table 5 above, the refers to the corresponding bit in the coefficient 
and may bea 0 or L 

In one embodiment, dxiring decoding, when the most important 
data is decoded, it is written to memory, and at the same time, the proper 
5 two signaling bits are written to memory to initialize the memory for 
storing the less important data. Pepending on the alignment of the 
coefficients, some of the most important data may be stored in the second 
byte also.) With this initialization, decoding the less important data one 
bitplane at a time only requires reading and then writing one byte (or less 

10 in some embodiments) per coefficient. When the coeffidents are read to 
be input to the inverse transform, they are converted into a normal 
numerical form (e.g., two's complement form). 

In addition to having "most important data" and ^ess important 
data", there may also be data that is discarded or quantized during 

1 5 encoding. Coefficients are divided by a quantization scale factor 20"^. 
(Quantization of coefficients is described in die JPEG Standard.) In tfie 
present invention, the quantization is a power of two, since division is 
accomplished by discarding bitplanes. For instance, Q=l represents 
division by 1 and, thus, the coefficients don't change, while Q=2 

20 represents division by 2, which mear\s one bit plane is discarded. These 
di\asions may be implemented using shifts (e.g., shift by one bit position 
for 0=2). Figures 13B and 13C illustrate the format of the most important 
and less important data when both quantization and coefficient • 
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alignment for different subbands is taken into account 

Figiire 13B shows the lossless case where no data'is 'discardeS. 
Following flie convention of JPEG, this is called quantization 0=1, 
because the actxial coefficient are divided by 1 (lossless). The most 
important data is indicated without cross-hatching, while the least 
important data is aoss-hatched. 

Hgure 13C shows the case where one bitplane of data has been 
discarded (i.e., Q=2) because discarding a biplane is equivalent to 
division by 2. The discarded bitplane is shown in blade 

Note that in addition to what is shown in Figures 13B and C, the 
most important data also includes the SS coefficients. Although 
coefficients are shown for eight-bit data, the use of a reversible color space 
would require nine-bit data, increasing the size of chrominance 
coefficients by one bit 

In the present invention, the sign bit context model comprises 
encoding the sign after the last head biL There are three contexts for the 
sign deperiding on whether the N coefficient is positive negative or the 
sign is not yet coded. Alternatively, one context can be used for the sign 
or the sign can always be coded as 50%. 

Order of Coding for Wavelet Coefficients 

One embodiment of the ordering of codirig for wavelet coefficients 
is summarized in the following pseudo<ode: 
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code tfie most important data 

code the position of the first less important bitplane with data 
for each less important data bitplane do 
code a less important data bitplane 



When the most important data is encoded, the first bitplane in the 
less important data that is not comprised entirely of zero head bits is 
determined for each coeffidenL This allows the encoder and decoder to 
look-ahead over entire bitplanes of less important data. This is especially 
useful for coding units of black and white data where all the information 
is in the K coefficients and flie CMY coefficients are all zero. Not coding 
bitplanes individually helps compression ratio, particularly if R2C7) is the 
longest run length code available. (See VS. Patent Nos. 5,381,145 and 
5,583,500 for a description of "R2- codes.) However, if the four parallel 
coding cores operate on components synchronously, the speed of 
processing is determined by the con^onent wiA the most bitplanes to 
code; cores assigned to oAer components are idle during uncoded 
bitplanes. 

A flow chart illustrating one embodiment' of the operation of the 
pseudo code above is shown in Rgure 14. Referring to Kgure 14, the 
context model begins by coding the most important chunk (MIQ 
(processing block 1401). After coding die MIC, the processing logic codes 
the position of the first least important chunk (UC) bitplane with data 
(processing block 1402). This is for the entire coding unit Either 0, 1, 2, 3, 



67 

4, 5 or 6 bitplanes will contain data if there are 6 bitplanes in the LIC 
Then, the processing logic sets a current UC bitplane variable to the first 
Lie bitplane with data (processing block 1403). 

Next, a test determines if all the UC bitplanes with data have been 
coded (processing block 1404). If so, the process ends; if not, the 
processing logic codes a UC bitplane (processing block 1405) and sets the 
current UC bitplane variable to ti\e next UC bitplane (processing block 
1406). Thereafter, processing loops back to processing block 1404. 

Order of Coding for Most Important Data 

One embodiment of the order of coding for the most important 
data is as follows: 

for each tree do 

code the S5 coefficient 

perform MIC lookahead (or perform tree lookahead) 
for each non-S5 coefficient 

for each bit (plane) wi^ data do 

code head or tail bit 
if tiie coefficient is not zero 
code sign bit 

The most important data is processed one wavelet'tree at a time. 
To reiterate, it is not embedded. An MIC look-ahead determines 
bitplanes that are all zero head bits for all non-SS coefficients in Ae 
wavelet tree. In one embodiment, a four-bit nxmxber is sufficient to 
identify the first bitplane to code individually. In an alternate 
embodiment shown in Figure 15, one bit is used to indicate all non-SS 
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coefficients 1501 of the second decomposition (hatched rtgion) are zero 
and another bit to indicate aU non-SS coeffidehts 1503 of Ae first 
decomposition are zero. These two bits are used in addition to the four 
bits used to spedfy the first bitplane- 

In an alternate embodiment, a tree lookahead may be used where 
the SS coefficients are coded and then for Ae whole tree, the first bit 
plane with non-zero head bits is coded. 

To account for context revisit delay if conditioning is used for the 
SS and first bitplane coding, the actual coding /decoding of bits of the SS 
coefficient (which is 9 bits if a reversible color space is used) and the look- 
ahead value can be alternated. If conditioning is not used, alternating is 
not required. 

As discussed previously, the context model of the present 
invention uses a look-ahead. One embodiment of Ae look-ahead may be 
employed for Ae most important data, Le. tite most important chunk 
(MIC). In one embodiment, as shovm in Figure 15, for each tree, 6 bits are 
used: 4 for maximum bit plane, 1 for level 0 all zero, 1 for level 1 all zero. 
If the maximum bitplane is zero, then the two extra bits are redundant, 
but this is not important Otherwise, one adaptive coding decision is 
used to dedde "(isolated) zero/nor^zero". For non-zero coefficients, they 

may be further specified by: 

. One M-ary operation to determine the value and sign of 
coefficient. (Total: 2 cycles per coefficient). 
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• One adaptive coding decision is used to dedde "±1 /not ±1". 
A second cyde is used to get the sign "with Ae magnitude is 
1 and the sign and value for magnitudes greater Aan 1. 
(Total: 3 cycles per coefficient) 

5 • Similarly, *'±l/not ±1", •±2^/not ±2^", and so on could be 

doT« for a total of 4 cydes per coeffidenL 

• The following procedure: 

If all bitplanes in the MIC not are zero then 
10 adaptively code a decision 0, 1" or "oAer" 

if "-1,0,1" then 

adaptivdy code a decision "0" or "-l,*!" 
if M,+l'' then 

specify sign bit 

15 else 

adaptively code a decision "-3, -2^3" or "other" 
if "-3, -2,2,3" then 

specify "-2,2" or "-3,3" with one bit 

specify sign bit 

20 else 

specify value with the maximum number of bit that 

was determined for tree 
specify sign bit 

25 It should be noted that "specifying" a bit or bits can be coding adaptively, 
coding at 50% probability or simply copying bits to the coded data stream. 

If an or most of Ae bitplanes are to be iruiividuaDy coded, 5orr« 
levels of the transform may have imused bitplanes due to alignment - 
unused biplanes are never coded. There are a number of options for 

30 handling bit to context delay for the head and tail bits. One method is to 
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do three coefficients in altenxation: a DD, a SD and the a DS. The sign bit 
for non-zero coeffidents can be coded at the end of the coefficient - since 
all of the most important data is always lossless, exactly following the 
first "on" bit is not necessary. 

One embodiment of flow chart illustrating the pseudo code for 
coding the most important chunk is shown in Figure 16. Referring to 
Figure 16, the process begins with the processing logic setting the current 
tree to the first tree (processing block 1601). Then, the processing logic 
codes the SS coeffideiit (processing block 1602). After coding the SS 
roeffident, the processing logic codes the position of the first bitplane 
with data in the MIC of the tree (processing block 1603) or performs the 
MIC lookahead. 

Then, the processing logic tests whether the MIC of ti^e entire tree 
is zero (processing block 1604). If the MIC of the entire tree is zero, the 
processing continues at processing blodc 1614; otherwise, processing 
transitions to processing block 1605 where Ae processing logic sets the 
current coeffident to the first non-SS coeffident in the tree. 

After setting the current coeffident to the first nori-SS coeffident 
m the tree, the processing logic sets the current bitplane to flie first 
bitplane with data (processing block 1606). Then, the processing logic 
codes a bit of the current coeffident in the current bitplane (processing 
block 1607). Afterwards, the processing logic tests whether all Ae 
bitplanes have been coded (processing block 1608). If all the bitplanes 
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have not been coded, the processing logic sets the current bitplane to the 
next bitplane (processing block 1609) and transitions to processing block 
1607. If all the bitplanes have been coded, the processing logic tests 
whether the current coefficient is zero (processing block 1610), If the 
current coeffident is not zero, the processing logic codes the sign bit 
(processing block 1611) and processing transitions to processing block 
1613. If the current coeffident is zero, then the processing logic 
transitions to processing block 1613- 

At processing block 1613, the processing logic tests whether all 
- coeffidcnts in the tree have been coded. If all the coeffidents in flxe tree 
have not been coded, then the processing logic sets the current coefficient 
to the next coeffident in the tree (processing block 1612) and the 
processing transitions to processing block 1606. If all of Ae coeffidcnts in 
the tree have been coded, then Ae processing logic tests whether all trees 
have been coded (processing block 1614). If all the tree have been coded, 
processing ends; otherwise, processing transitioi\s to processing block 
1615 where the processing logic sets the current tree to the next tree and 
the processing transitions to processing block 16Q2. 

Figure 17 is a block diagram of one embodiment of the formatting 
unit and context model used during the most important data coding pass. 
Referring to Figure 17, a barrel shifter 1701 is coupled to receive the 
magrutude of Ae coeffident and a quantization level Aat was used 
during encoding to prevent the most important data from exceeding the 
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miiumuin disk bandwidth, ensuring lossless decompression. Thus, the 
quantization level controls barrel shifter 1701. Tn one embodiment, 
barrel shifter 1701 shifts the magnitude bits by 0, 1, 2 or 3 to support 
quantizations of 1, 2, 4 or 8. Ii} an alternative embodiment, a lower or 
higher number of quantizations are supported, such as only two 
quantizations. 

The output of barrel shifter 1701 comprises the lower order six 
bitplanes which is the less in^ortant data and flie rest of &e higher order 
bits which is the most important data. In an alternate embodiment, a 
simple separation mechanism is used to produce these two outputs. 

Both outputs of barrel shifter 1701 are input to first bitplane unit 
1702, wHch determines which bit planes have data in them. First bit 
plane unit 1702 is used to find the bitplane with first "on" bit for the 
entire coding unit (see Figure 10) for use when processing the less 
important data. Another bit plane unit 1706 is coupled to receive the 
most important data output from barrel shifter 1701 as well First 
bitplane unit 1706 is used for each tree when processing the more 
important data. One embodiment of the first bitplane unit is described 
below with reference to Figure 18. 

Barrel shifter 1701 is also coupled to comparison units 1703 and 
1704, which perform two comparisons on the most important data to 
generate the two bit signaling information for the less important data. 
Comparison unit 1703 determines if the most important data is equal to 
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0, thereby indicating whether a tail bit has occurred already (Le., whether 
coding is in the tail yet). The output of comparison urut 1703 is the tail- 
on bit. Comparison urut 1704 determines whether the most important 
data is equal to 1. If the most important data is equal to 1, then from 
Table 5 above the output is 0. The output of comparison unit 1704 is 
coupled to one input of multiplexer (MUX) 1705* The other input to 
mux 1705 is coupled to receive the sign bit A select input of m\ix 1705 is 
controlled by the output of comparison unit 1703, such that if the output 
of comparison tmit 1703 indicates that the bit is a tail bit, then the output 
of mux 1705 is a "first tail" bit 1304. However, if Ae output of 
comparison imit 1703 indicates that the bit is the head bit, then mux 1705 
is controlled to output the sign. 

In one embodiment, the comparison imits 1703 and 1704 may be 
implemented using simple bit comparators. 

A memory 1707 is coupled to receive the sign bit, the most 
important data output from barrel shifter 1701 and the output of bit plane 
unit 1706. Memory 1707 is used to delay coefficients so that parent and 
neighboring information is available for the conditioning.' The 
organization of memory 1707 is discussed bebw. 

Context models (CM) 1710-1712 provide conditioning for the sign, 
head, tail and other bits. Each of these context models is described in 
below. 

Figure 18 illustrates one embodiment of a first bitplane iriiiL 



Referring to Figure 18, first bitplane tmit 1800 comprises an OR gate 1801 
coupled to receive a coeffident and a feedback from the output of a 
register 1801 The output of OR gate 1801 is coupled to the input of 
register 1802. Register 1802 is controlled by a start of tree/codling unit 
5 reset indication. The output of register 1802 is coupled to a priority 
encoder 1803. The output of the priority encoder 1803 is the output of 
first bitplane urut 1800. 

At the start, register 1802 is cleared. Each bit of register 1802 is 
ORed with each bit of the input coeffident using OR gate 1801. For each 

1 0 bit of the coeffident that is 0, the value of register 1802 remains its 
current value, whidi is output to priority encoder. For each bit of the 
coeffident that is a 1 (e.g., the first one), the output of OR gate 1801 to 
register 1802 is a 1, which is output to the priority encoder 1803. The 
priority encoder 1803 then locates the first 1, which is the first bitplarxe of 

15 the coeffident tf\at has a 1. 

Order of Processing for less Important Data 

Each bit plane for the least important data is processed as follows: 

for ead\ tree do 

for each coeffident do 

if start of loolc-ahcad interval 

do look-ahead 
if look-ahead not active 
code head or tail bit 
if first "on" bit 

code sign bit 
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One embodiment of the process of coding a UC bitplane is shown 
in the flow chart of Figure 19. The process of coding an UC bitplane 
begins with processing logic setting tfie current tree to the first tree 
(processing block 1901). Tlien, the processing logic sets the current 
5 coefficient to the first non-SS coefficient in the tree (processing block 
1902). After setting the current coefficient to the first non-SS coefficient 
in the tree, the processing logic tests whether the coding is at the start of a 
look-ahead interval (processing block 1903). If flie coding process is at tfie 
start of a look-ahead interval, the processing logic perfonns a look-ahead 
1 0 (processing 1904) and processing continues at processing block 1905. If tfie 
coding process is not at the start of a look-ahead interval, processing logic 
transitions directly to processing block 1905 and determines is look-ahead 
is active. 

If look-ahead is active, processing continues at processing block 
15 1909 where the processing logic determines if all the coefficients in the 
tree are coded. If all the coefficients in the tree are coded, processing 
continues at processing block 1913; otherwise, tiie processing logic sets the 
current coefficient to Ae next coefficient in Ae tree after the look-ahead 
interval (processing block 1910) and the processing transitions to 
20 processing block 1903. 

If tfie look-ahead is not active, the processing logic codes the head 
or tail bit (processing block 1906) and then tests whether the first non- 
zero bit has been received (processing block 1907). If the first noiV-zcro bit 
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has not been received, processing continues at processing block 1911. If 
the first non-zero bit has been received, processing continues at 
processing block 1908 where the processing logic codes the sign bit and 
processing then transitions to procesOTg block 1911. 

At processing block 1911, the processing logic determines whether 
all coefficients in the tree have been coded. If all coefficients in the tree 
have not been coded, the processing logic sets tfie nirrent coefficient to 
the next coefficient in the tree (processing block 1912) and transitions to 
processing block 1903. If all the coefficients in the tree have been coded, 
the processing transitions to processing block 1913 where the processing 
logic tests whether all trees have been coded. If all the trees have not 
been coded, processing logic sets the current tree to tfie next tree 
(processing block 1914) and processing continues at processing block 1902. 
If all the trees have been coded, the processing ends. 

Processing a wavelet tree at a time may not be important, but since 
the transform causes data to be read and written in that order, it may be 
convenient. If data is processed by wavelet trees, bit to context delay can 
be accommodated by alternating between DD, SD and DS coefficients 
(alternating between sub-trees). Otherwise, one subband at a time can be 
coded. Regardless of the order chosen, unused he^d/tail bits due to 
alignment of different subbands are never coded and do not require idle 
cycles. 

Figure 20 is a block diagram of one embodiment of the look-ahead 
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and context models for less important data. In one embodiment, the 
most important data and the less important data use the same context 
models (CM) that j?rovide conditioning for the sign, head and tail bits. 

Referring to Figure 20, context models 2001-2003 are coupled to the 
input data. A sign context model 2001 is coupled to receive the tail-on 
bit, a sign/first tail bit signal, and Ae data. The head bit context model 

2002 is coupled to receive the tail-on bit and the data. The tail bit context 
model 2003 is coupled to receive the tail-on bit, a sign/fiist tail bit signal, 
and the data. In response to their inputs, each of context models 2001- 

2003 generate a context 

The contexts generated by context models 2001-2003 are coupled to 
inputs of mux 2004. Mux 2004 is controlled by the previous bits and the 
bit significance representation itself. The head content model 2002 is 
used tmtil a 1 bit is seen at the data input Ihe sign content model 2001 is 
used when the last bit v^as the first 1 bit of the head. Thereafter, the tail 
content model 2003 is used* 

The output of mux 2004 is coupled to "^head?" imit 2005 and first- 
in/first-out (FIFO) buffer 2006. The "'•head?" unit 2005 tests if the current 
context is a head bit context with zero head bits in the neighborhood ar\d 
parent If all the context are in the head, a signal from "^head?" unit 
2005 clears FIFO 2006. 

The contexts and results are buffered in FIFO 2006 or other 
memoiy for the look-ahead interval At the end of the interval,*if 
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necessaiy a look-ahead decision and/or individual decisions are coded. If 
the coefficients are processed one wavelet tree at a time, the FIFO for 
look-ahead can be a single FIFO used for all subbands or multiple FIFOs 
can be used, one for ead\ subband. 

5 Note that if it was convenient to reduce multiplexing, the most 

important data could use look-ahead too. However, it may be somewhat 
redimdant to use both look-ahead and first bitplane for each tree. 

If a core assigned to one component codes a sign bit, cores assigned 
to any other components that do not code a sign bit at the same bitplane 

1 0 will be idle. Therefore, up to four dock cycles could be used for sig;i\ bits 
if each core codes a sign bit on a different bitplane. In one embodiment, 
there are up to six head or tail bits per coefficient. 

One possible timing problem is fliat the most important chrmk 
compresses sufficiently well that the disk is idle during the decoding of a 

1 5 portion of that data. If there is sufficient memory bandwidth to the band 
buffer, look-ahead may be used to process the most important data faster. 
Then the less important data can get a head start Also, it would be good 
if the disk had a burst transfer rate Aat was hi^er tiian the maximum 
sustained rate. Hard disks normally have a significant buffer, and 

20 perhaps reading ahead into this buffer would eliminate the idle time. 

Conditioning a Portion of the Context Model 

The conditioning used in the context model is dependenton 
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hardware cost versus compressicm trade-offs. Therefore, m the following 
sections, many options for conditioning are presented for designers to 
consider. 



5 Context Model for SS Coefficients 

In one embodiment of the context model, SS coefficients are not 
coded. Since they make up only l/256th of the original data, there is little 
gain to coding them. If coding them is desired, Aey could be handled by 
Gray coding, conditioning on previous bit in the same coefficient, and /or 
10 on corresponding bit in the previous coefficient 

Context Model for First BitpJane Information 

The four bits of first bitplane information for Ae most important 
data each wavelet tree can be treated in a similar fashion to tiie SS 
1 5 coeffidents. The increase the size of the original data by only 1 /512th. In 
one embodiment, they can be imcoded due their smaD size compared to 
the origirial data or undergo gray coding and some conditiorung. 

Similarly, if sbc bits are used according to Figure 15, they can be 
treated like SS coefficients. 



20 



Context Model for Hend Bits 

Hgurc 21 is a block diagram of one embodimerU of Ae context 
model which provides the conditioning for head bits. Referring- to 
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Figure 21, context model 2100 contains shift registers like those found in 
a bitplane context model An important difference is that instead of 
using previous coefficient bits form the current bitplane, conditioning is 
based on tail-on ir\Jormation which uses all previous bltplanes and 
5 previously coded information in the current bitplane. Also, some bits to 
identify the bitplane coded or the group of bitplanes coded and the 
subband or group of subbands coded rate generated by the importance 
level and subband bucketing. 

Referring to Figure 21, tfie context model comprises two inputs, 

10 the current significant level 2110 and the coefficients from memory 2111. 
The current significance level 2110 is coupled to inputs of the tail-on 
ii\formation/bit generator(s) block 2101 and the importance level and 
subband bucketing block 2102. The coefficients from memoiy are also 
coupled to block 2101 and the registers 2103-2106. 

1 5 Block 2101 takes the coefficients and determines if fl\ere is a one bit 

or not. In one embodiment, 2101 also determines out where the one bit 
is. The output of block 2101 is one or two bits based on the tail-on 
information. In one embodiment, the tail-information relates whether 
or not the first non-zero magnitude bit has been observed (e.g., whether 

20 the first "on-bit" has been observed) and, if so, about how inany bit- 
planes ago. Table 6 describes the tail-information bits. 
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Table 6 - Definition of the tail information 


Tail 


Definition 


0 


no on-bits is observed yet 


1 


the first on-bit was on the last bit-plane 


2 


the first on-bit was two or three bit-planes ago 


3 


the first on-bit was more than Aree bit-planes ago 



From the 2-bit tail mformation, a 1-bit "tail-on" value is synthesized to 
indicate whether the tail irxfonnation is zero or not In one embodiment, 
5 the tail-information and the tail-on bits are updated immediately after 
the coefficient has been coded. In another embodiment, updating occurs 
later to allow parallel context generatioiL 

In addition, the two bits may be used to indicate the importance 
level being coded. The first two bit planes use value 0, the second two 1, 
10 the third two 2, and the remaining bit-planes 3. In addition, there is a 
ron-length encoding of the bits that are all zero head bits. 

The 10 bits of context for flie head bits includes the 2 bits of 
information each fi-om Ae parent and the West coefficients, 1 bit of 
information from each of the North, East, SouthWest, and South 
15 coefficients, and 2 bits of importance level iixformation. 

In one embodiment, the tail-information is not used for some or 
all £requeT\cy bands. This allows a frequency band to be decoded without 
previously decoding its parent 
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In another embodiment, the assignment of the bit planes of each 
frequency band to importance levels xises one alignment The 
determination of tail-on information of the parent vses a second 
alignment, which uses fewer biplanes of Ihe parent than iaave actually 
been coded. This aUows some bitplanes of a frequency band to be decoded 
without decoding the corresponding bitplanes of the parent in the same 
importance level (see Figure 38). For example, an image may be encoded 
with pyramidal aUgnment, but with parent taU-on information based on 
KISE alignment (see Figure 39). This allows the decoder to decode in 
pyramidal alignment, to simulate MSE alignment, or to simulate any 
alignment between pyramidal and MSE 

Referring back to Figure 21, the outputs of block 2101 are coupled 
to the inputs of registers 2103-2106. Registers 2103-2106 accumulate the 
neighborhood data. For instance, the above/left shift register maintains 
bits during the line that is immediately above the current coeffident 
The current shift register contains the bits in the current line of 
coefficients, while the below/right shift register 2105 contains the lines 
from ^ line immediate below Ae shift register. Lastly, parent register 
2106 maintains Ae parent data. The outputs of the shift repsters form 
the context 

The output of importance level and bucketing block 2102 may also 
be used for a context Such would be part of the context when Ac 
subbands and different levels are to be coded to the same context Ifthat 
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is the case, the output of block 2102 is combined with the outputs of the 
registers 2103-2106 to form the context. If not, the context only comprises 
the outputs of registers 2103-2106. 

Also output from ihe context model 2100 is a bit 

Coding can be done by alternating between DD, 5D and DS 
coeffidents to allow for the bit to context delay for use of data from the 
current bitplane (alternating between sub-trees). 

Note that memoiy is needed to store coeffidents needed for 
conditioning (see Figure 17). The memoiy usage for one embodiment of 
the context model with conditioning on all neighbors and parents is 
shown in Figure 22. A short seam transform order is assumed. (External 
memory could be used to support a long seam transform order. This 
would reqxiire both additional memoiy storage and bandwidth). 

Conditioning on high level parents is espedally costly. The level 4 
DD coeffident for a given tree is not computed imtil 16 trees later than 
most of the level 1 DD coeffidents for that tree. Also, storing entire 
coeffidents to be coded later (unshaded in Figure 22) is much more costly 
that only storing tail-on information for later use in conditioning (cross 
hatched in Figure 22). Conditioning only on 'west" information that is 
in the same tree arid on parents that are generated without data from 
"west" trees would greatly reduce the amoimt of memory required. 
When parent or west information was not available, copjwg the 
information from the north or east is useful 
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Context Model for Sign Bits 

The context model that provides conditioning for sign bits is 
simple. If the sign of the above pixel isToiown, it is used for 
5 conditioning. If the sign bit for the above pixel is imknovm, then the bit 
is uncoded (R2(0) is used. Alternatively, no coding (R2(0)) can be used for 
all sign bits.) 

Figure 23 is a block diagram of one embodiment of the context 
model for sign bits. Referring to Figure 23, a mux 2301 receives a north 
1 0 sign bit 2303 and a 0 bit 2304 (hardwired) and is controlled by a north tail- 
on bit 2302 to output the north sign bit 2303 if the north tail-on bit 2302 is 
a 1; otherwise, mux 2301 outputs a 0. Thus, the north pbcd suppUes the 
north tan-on bit 2302 and north sign bit 2303 to provide a context for the 
pixel south of the north pixel 
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Context Model for Tflfl Bits 

No conditioning is used for tail bits. In one embodiment, a fixed 
probabilit>' state is used, and no probabilitj- update is used. Table 7 shows 
three options for codes to use for tail bits. The second option which uses 
20 R2(l) and R2(0) is a good dioice- 
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Table 7 - Probability states (codes) xised for tap bits 



bit of tail 


1 23 4... 


Option 1 
Option 2 
Options 


R2(l) golden ratio code R2(0) 
J12(1) R2(0) R2(0) 
R2(0) R2(0) R2(0) 



In one embodiment, the golden ratio code, which is good for 
probabilities of M=60%, Lh40% is: 
codeword 
5 MMM 00 



MML 110 
ML 01 
LM 10 
SS 111 

10 

Context Bin Summary 

The minimxim number of context bins tiiat could be used in the 
system is as follows. SS, first bitplane for cadi tree, sign and tail bits all 
are not coded (the code is used R2(0)). Although no PEM state or most 
1 5 probable symbol (MPS) bit needs to be stored^ there must be logic to select 
the R2(0) code. Therefore, depending on how this is coimted^ the 
hardware cost is zero or one context bin. Adaptive coding should be used 
for head bits. For less important data, since one bitplane at a time is 
coded, conditioning on tfie bitplane is not important For most • 
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important data, the first bitplane for each wavelet tree may reduce the 
niimber bitplanes siiffidently that conditioning on the bitplane is not 
important. It is less dear what the usehilness of conditioning on the 
subband is, but this will also be ignored in thas mirumum context 
5 example. The taD-on bits of three neighbors and one parent could be 
used for a total of four bits (16 context bins). One additional context bin 
can be used for look-ahead. (It may be more convenient to map two head 
context bins together to make room for the look-ahead so the memory 
size is still a power of 2). 
1 0 With four cores (requiring replicating contexts four times) and two 

context memory banks per core, the minimum number of context bins to 
use would be between 128 and 144 depending on how ''not coded*' 
contexts are counted and whether two head context bins were mapped 
together. 

15 A system with a generous amount of conditioning is as follows: 

• For SS (9-bit) and first bitplane (4 bit), use 4 context bins per 
bit, for a total of 52 context bins. (These could be divided 
into banks, they do not have to be duplicated). 

• Tail bits are not coded, but both R2(0) and R2(l) are used. 
20 Depending on how this is coimted, tills costs 0, 1, or 2 

context bins. 

• Two adaptive contexts and one "no code" context is used for 
the sign bits. 
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• The head bits could xise 8 bits from neigbbois/parent and 2 
bits for subbEiid/bitplane infonnEiion (1024 context bins). 

• One context is used for look-shead. 

An dteiTttiive esribofliment of z cont ex t jnodd, indudJng 
embodiment of & sign/mEgnltudc unit that converts input coefficients 
into a sign/mzgnitude format, is described in GB-A-2303,030 

.• entitled "Method and 
Apparatus For Compression Using Rev.ersible Wavelet Transforms and 
an Embedded Codestream" and ,CB-^-^303,031 

end entitled "Reversible Wavelet 
Trar\sfozm arid Enibedded Codestream Manipulation" and also 
jp.A.10-84484 :^ -and entitled 

"Compression and Decompression wifli Wavelet Style and Binary Style 
Including Quantization by Device-Dependent Paisef. 

The context modd provides contexts for entropy coding of flic 
deti. In one embodiment, £31 the entropy coding perfonncd ty the 
present invention is perfonned by binaiy entropy coders. A single coder 
may be used to produce a single ou^ut code stream. Altcmatdy^ 
miJtiple physical or virtual) codexs may be employed to produce 
multiple physical or virtual data streams. 
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M-ary Coding for UC 

Rgure 24 iDustrates the use of M-aiy coding for flte UC The use of 
M-aiy for a reduced coding operates as a lookahead (as shown). At first, 
the state of the next eig^t coefficients is examined. If there is anything in 

5 the head, entropy coding is performed on the head bits, such that all head 
bits on an entropy coded, one per cyde, until all head bits in the 8 are 
coded. Referring to Figure 24, head bits whidi are 1 are coded in the first 
and third cycles, while head bits that are 0 are coded in flie second and 
foTirth cycles. Once all of the head bits are entropy coded, tfie sigji and tail 

1 0 bits are coded in the same cycle- For example, in Fig:ure 24, all the sign 
and tail bits that followed a head bit that is 1 are coded in flie fifth cycle. 
In this manner, the overall number of cycles is reduced. 

fi. fpntin^ f^vstem f ppliraHon of the Present Invention 
1 5 Figure 25 is a block diagram of one embodiment of the front end of 

a printer. Referring to Figure 25, a rcndercr 2501 receives data in the 
form of a page description language or display list Renderer 2501 may 
comprise raster image processing. For cadi location (e.g., spot), renderer 
2501 determines its color (e.g., black/white, 8-bit-RBG values, 8 bit CMYK 
20 values depending on the application). The output of renderer 2501 is a 
set of pbccls formatted into bands and stored in band buffer (memory) 
2503. 

In an alternative embodiment, data from a Page Description 
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Language (PDL) such as Adobe Postscript™ or Microsoft Windows™ GDI 
is rendered into a display list The display list is used to generate bands of 
pixels. In this embodiment, it is assumed that the pixels represent 
confinuoxis-tone values, and any halftorving or dithering required by &e 
print engine will be performed after decompression. 

In the present invention, the memory used for the band buffer 

2503 is also used for workspace for compression (without increasing the 
memory required). This dual use is described in more detail below. 

Compressor 2504 compresses each band of pfacels is compressed. If 
the input to compressor 2504 are halftoned or dithered pixels, compressor 

2504 would still work but the compression achieved would likely be poor 
with wavelet processes. A bmary context model can be used on halftoned 
or dithered pixels. Compressor 2504 writes the compressed data to disk 
2505. Disk 2505 may be a hard disk. In an alternative embodiment, disk 

2505 may be random access memory (RAM), Flash memory, optical disk, 
tape, any type of storage means, any type of commxmication channeL 

Figure 26 is a block diagram of one embodiment of the back end of 
the printer. Referring to Figure 26, the back end of printer 2500 comprises 
a decompressor 2602 coupled to disk 2505, a band bxiffer (memory) 2603 
and a print engine 2604. The decompressor 2602 reads compressed data 
from the hard disk 2505 and decompressed. The decompressed data is 
stored in band buffer (memory) 2603 in the form of pixels. Band buffer 
2603 may be same memory as band buffer 2503 to operate as workspace 
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for compressor 2504. Decompressor 2602 keeps band bxiffer 2603 
Siiffidently full so that pixels car\ be sent to print engine 2604 in real- 
time. 

Figure 27 is an alternative emboxiiment that includes an optional 
5 enhancement. Referring to Figure 27, pixels from decompressor 2602 go 
to band buffer 2603 via enhancement block 2705, while other 
information, which is the information that is not yet pixf?ls (partial 
coefficients), is sent directly to band buffer 2603. EiAancement block 2705 
may perfonn such functions as interpolation, smoothing, error diffusiorv 

10 halftorung and /or dithering. 

The bandwidth needed between decompressor 2602 and band 
buffer 2603 allows decompressor 2602 to first write transform coefficients 
to band buffer 2603, access band buffer 2603 to obtain certain coefficients 
and perform the inverse trai^form on such coefficients and then write 

1 5 them back to band buffer 2603. Note fliat band buffer 2603, as a work 
space memory, may be small. For instance, if a full page image is 64 
megabytes and band buffer 2603 is 16 megabytes, it would still be 
considered a small work space memory. 

In one embodiment, A4 images at 400 dpi with 32 bits/pbcel (foixr 

20 8-bit components, CMYK) about 8 pages/minute require a data rate of 
approximately 8 Mbytes/s from band buffer 2603 to print engine 2604. 
The transfer rate of an exemplary hard disk is around 2 Mbytes per 
second (e.g., 1.7-35 Mbytes/s). TTierefore, a typical compression ratio of 
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about 4:1 is required to match the bandwidth of disk 2601 to the 
bandwidth of the printer. In one embodiment, compressor 2504 in 
Figure 25 and decompressor 2602 in Figures 26 or 27 are contained a 
angle integrated circuit dup. 

Figure 28 is a block diagram of one embodiment of an integrated 
circuit (IC) chip containing the printer compression/decompression. 
Refeiiing to Figure 28, pixel data interface 2801 is coupled to the band 
buffer (not shown). Pixel data interface 2801 generates addresses for 
reading and writing pixels from and to the band buffer, respectively. An 
optional reversible color space 2802 may be included to perform a 
reversible color space conversion. Coefficient data interfece 2804 
generates addresses for reading and writing coefficients and properly 
assembles two b}te coeffidents. Coefficient data interfece 2804, along 
v,ith pixel data interface 2801, handle any line buHering or coefficient 
buffering that is required'to be in external memory. Coefficient data 
interface 2804 and the use of a reversible color space is discussed in 
greater detail below. 

It should be noted Aat tiie double arrows imply that data may flow 
in either directjort For instance, in compressing the data, data moves 
from left to right through different components of £he IC chip. On the 
other hand, when decompressing data, the data moves from right to left 
generally. 

When coding data, pixel data from pixel data interface 2801, or 
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reversible color space 2802 (if included), are received by wavelet 
trai\sfonii block 2803 which performs the wavelet transform on the pixel 
data. In one embodiment, the transform performed by wavelet 
transform block 2803 is an overlapped wavelet transform. It provides 
energy compaction for bofli lossless and lossy image compression. For 
lossy compression, the block boundary artifacts that plague JPEG are 
avoided. The filter coefficients, when properly ab'gned, are normalized 
so that scalar quantization provides good lossy compression results. In 
one embodiment, the wavelet transform block 2803 performs a 2,6 
transform. In another embodiment, wavelet transform block 2803 
performs a 2,10 transform. Wavelet transform block 2803 may perform 
other well-known transforms. Various implementations of wavelet 
transform block 2803 arc discussed in greater detail below. 

The coefficients output from wavelet transform block 2803 may be 
written back to the memory (e.g., the band buffer) via coefficient data 
interface 2804 for coding Uter. In one embodiment, the data that is 
written back to memory is less important data and will be described in 
detail below. Such data is later read back into Ae IC chip and ccded- 

The coefficients output from wavelet transform block 2803 or received 
via coefficient data interface 2804 are provided to context model 2805. 
Context model 2805 provides the context for encoding (and decoding) data 
using encoder/decoder 2806. In one embodiment, context modd 2805 
supports sending data direcUy to coding. In this way, context model 2805 
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operates as the most important context model. An architecture for 
implementing various context models has been described above. 

In one embodiment, encoder/decoder 2806 comprises a high speed 
parallel coder. Iheiiigh-speed parallel coder handles several bits in parallel. 
In one embodiment, the high speed parallel coder is implemented in VLSI 
hardware or multi-processor computers without sacrificing compression 
performance. One embodiment of a high speed parallel coder that may be 
used in the present invention is described in VS. Patent No. 5,381,145, 
entitled "Method and Apparatus for Parallel Decoding and Encoding of 
Data", issued January 10, 1995. 

In alternative embodiments, Ae binaiy entropy coder comprises 
either a Q<oder, a QM-coder, a finite state machine coder, etc The Q and 
QM-coders are well-known and efficient biliary entropy coders. The 
finite state machine (FSM) coder provides the simple conversion from a 
probability and an outcome to a compressed bit stream. In one 
embodiment, a finite state machine coder is in^lemented using table 
look-ups for both decoder and encoder. A variety of probability 
estimation methods may be used with such a finite state madune coder. 
In one embodiment, the finite state machine coder of Ae present 
invention comprises a B-coder defined in VS. Patent No. 5,272,478, 
entitled "Method and Apparatus for Entropy Codmg", issued December 
21, 1993. 

The output of encoder /decoder 2806 is coupled to coded data 
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interface 2807 which provides ai\ ir\terfaee to the disk or other storage 
medium, or even another chaimeL 

Coded data interface 2807 sends and receives coded data from disk. 
In one embodiment, if the SCSI controller is included in the chip, it may 
be implemented at this point. In another embodiment, coded data 
interface 2807 communicates with an external SCSI controller. Non-SCSI 
storage or communication may be used. 

During decompression, coded data is received by encoder/ decoder 
2806 from the disk (or other memory storage or diarmel), via coded data 
interface 2807, and is decompressed therein using contexts from context 
model 2805. The coefficients that result from decompression are inverse 
transformed by wavelet transform block 2803. (Note that although 
wavelet transform block 2803 performs both forward and inverse 
transforms in one embodiment, in other embodiments, Ae two 
transforms may be performed by separate blocks.) The output of 
transform block 2803 comprises pbcels Aat undergo any optional color 
space conversion and are output to the band buffer via pixel data 
interface 2801. 

The basic timing of the system during printing is shown in Figure 
29. Referring to Figure 29, Ae coded data for each coding unit is read 
from disk. As much data as possible is read, and after a short delay 
coefficients are decoded. After decoding is complete, the inverse wavelet 
transform is computed. After the transform is complete, pixels can be 
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sent to the print engine. Note that the cross-hatching in Figure 29 
indicates when different actions occur for a specific coding unit 

Embedding Coefficients for Storage io Disk 

Figure 10 shows the organization of the coded data in the present 
invention. Referring to Figure 10, the most important data 1003 is coded 
in coefficient order (not embedded) immediately after being transformed. 
Therefore, this data does not have to be buffered. In one embodiment, 
the amount of most important data 1003 is limited so that it can always 
be read from disk. 

Some amount of less important data 1004 is buffered, embedded 
and written to disk in order of importance. The amount of data that may 
be buffered, embedded and written is determined on tf^ transfer time. 
That is, the system reads the data until Hie transfer time from the disk 
has expired. The transfer rate of the disk determines how mudi of data is 
kept. These rates are known and are dependent on physical 
characteristics of a particular transfer. 

For hard to compress images, some data may be discarded during 
encode time. The data is shown as least important data 1005. In the case 
that there is no possibility that the least important data can be read given 
the best case disk transfer rate, there is no reason to store Aat data on 
disk. For many and perhaps most images, no data would be discarded. 
The ordering of coded data and how it is accomplished is described 
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in greater detail above. 

In the following, band buffer management dxiring the 
compression and decompression is discussed, followed by a description of 
an embedding scheme for the coded data. Hardware implementations of 
the transform, the context model, and parallelism with the 
encoder/decoder are also described. 



VWp\ and Co f fflHpnt Tnterfaces 

Figure 30 illxistrates one possible embodiment of how pbcel data is 
organized. Referring to Figure 30, a page (image) 3000 is divided into 
hands 3001-3004. In one embodiment, page 3000 may comprise a page 
description language or display list description of a page that is used to 
generate pixels for the individual bands. In one embodiment, each of 
bands 3001-3004 is individually rasterized using display list technology. 
Each of bands 3001-3004 is further divided into coding units (eg., 3001A- 
D). 

An advantage of using multiple coding units per band is that 
portions of the band buffer can be used in rotation as workspace during 
decompression (similar to ping-pong buffering). In other words, one 
portion of the pixels can be decompressed, stored in the band buffer and 
sent to the printer, whfle a second portion of the band buffer can be used 
as workspace to store coefficients v.*ile decoding, wifli a third portion of 
the buffer being used to store Ae pbcels that correspond to the 
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Reversible Color Space 

The present invention provides for optionally performing 
reversible color space conversion fhat allows converting between two 
color spaces so as to be completely reversible and implemcntable in 
integer arithmetic That is, the color space data that is converted may be 
reversed to obtain all of the eidsting data regardless of any rounding or 
truncation that occurred during the forward conversion process. 
Reversible color spaces are described in VS. patent 5.731,988 

entitled "Method and Apparatus for Reversible Color 
Conversion- filed May 8, 1995, and assigned to the coiporate assignee of 

the present invention. 

Color space conversions allow the' advantages of an opponent 
color space without sacrificing the abiKty to provide lossless results. For 
the lossless case, an opponent color space provides decorrelation that 
improves compression. For the lossy code, an opponent color space 
aUows luminance information to be quantized less than chrominance 
information, proNiding for Wgher visual quaUty. When a reversible 
color space'is used with the transform of the present invovtion, properly 
embedding the luminance and chrominance coefficients is superior to 
subsampling for lossy compression, while still permitting lossliss 
compression. 

If a reversible color space is used, it is desirable to align Aft 
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coeffidents such tfiat the most significant bit of the 8-bit Ixuninance 
components and the 9-bit Arominance components have the same 
alignment For lossy compression, this alignment causes chrominance 
data to be quantized twice as much as lumij\ance data, and also allows for 
5 the possibility of lossless compression for luminance and lossy (but very 
high quality) compression for chrominance. Both of fliese results take 
advantage of characteristics of the Human Visual System. 

Other Pixel Operations 
1 0 Often a printer will have documents that are mostly or entirely 

non-continuous. For example, text images with black and white only (0 

and 255 values only) may be common. 

In one embodiment, the histogram of bands is completed. For 

example, 0,255 black/white only images (flie K component) can be 
15 remapped to 0,1 images. Similar compactions can be made for spot color 

images. Note if compaction is used, compression must be lossless. 

However, the lossless compression adueved is improved substantially 

when the compaction is perfonned. 

Alternatively, irutead of using the overlapped wavelet transforms 
20 described herein, binary and spot color images could be handled by a 

lossless, bitplane based, JBIG-like context model - 

In another alternate embodiment, Ac system may be designed to 

include a birury mode. Kgxire 35 illustrate one embodiment of a hmzry 
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context model that is similar to JBIG style context model template. 
Referring to Figure 35, shift registers 3501-3503 provide multiple bits per 
the JBIG standard. Shift registers 3501 and 3502 receive second and first 
above lines from line buffer 3500. The "above" lines provide the Jbits 

5 corresponding to pbcels in the northwest (NW)/ north (N), and northeast 
(NE) positions of the template, such as shown in Figure 37. The outputs 
of shift registers 3501 arid 3502 are provided directly to context model 
3505. The output of shift register 3503 is provided to an optional mux 
3504 which can implement the adaptive template of the JBIG Standard. 

1 0 Context model 3505 is coupled to probability estimation machine 3506, 
which is in turn coupled to bit generator 3507. Context model 3505, 
probability estimation machine 3506, and bit generator 3507 operate in a 
manner well known in flie art with respect to each otfier. 

The output of mux 3504 in conjimction with the outputs of shift 

1 5 register 3501 and 3502 and a feedback from the bit generator form the 
context bin address used to address the context memory. In one 
embodiment, context memory 3505 includes 1,024 contexts with six bits to 
describe each probability state. This requires a context memory of 1,024 

times six bits. 

20 Because tiie bit generator provides a decoded bit from the current 

line as part of the context address, there is a largetit to context" delay 
including the access time for the context memory. 

Figiire 36 illustrates an alternative embodiment whidt utilizes the 
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decoded bit from the cunent line to access the probability estimation 
machine in conjtmction with a same address block 3601 whidi receives 
the outputs of shift register 3501 and 3502 and the output of multiplexor 
3501 IhePEM 3506 receives the previous bit mdnases it to 
5 proper one out of the pair of context used. The selected context is 
updated, and both contexts are written back to memory. The same 
address block 3601 detects addresses that have already been read so that 
the data is already in the probability estimation machine. The same 
address block 3801 also sends Oie sig^ial to uise the data already in the PEM 

1 0 (which may be updated data) iristead of the stale information in memory. 

In one embodiment, the decoder includes 1024 context bins with 
six bits to describe each probability state. This requires the context 
memory of 512 times 12 bits. The ou^uts of shift register 3501 and 3502 
along vvith the output of multiplexor 3504 provide a partial context bin 

1 5 address which only lacks the use of the previous bit This results In a 
selection of a pair of context bins from context memory 3505. More than 
one bit of a context bin can be excluded from Ae partial context Each 
memory loj^ation contains 2*^ probability states, where n is the number of 
excluded bits. 

20 It should be noted that Ac "bit to context" delay is reduced* The 

context memory access can occur before the previous bit is decoded. The 
processing of tfie PEM state for both states in a pair can begin in paraUd 
before the previous bit is decoded. Hig^ speed operation can be « Aieved. 
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Encoder Rate Control 

In addition to having the ability to quantize data, performing rate 
control in the €ncoder also lequires measuring the rate so that decisions 
5 on quantization can be made. If the rate indicates that compression is not 
good (i.e.. not at a desired level), quantization may be increased. On the 
other hand, if the rate indicates that compression is too high, 
quantization may be deceased. Rate control decisions must be made 
identically in the encoder and tiie decoder. 
1 0 One method of assuring that the encoder and decoder make the 

same decisions is to use signaling. The encoder measures the rate at 
predetermined intenwls and stores tiie quantization, a in memory for 
future use in the next interval The decoder simply recalls the 
quantization from memory for each interval This would require extra 
15 memory. For example, an on-chip SRAM with 256 locations of 2 bits (for 
indicating a change in Q by 42, +1, 0, -1 or for storing Q as 1,2,3,4) would 
be enough for changing quantization, a for every 16 lines for a 4096 line 

image- 
There are many options for rate measurement Figure 34 
20 illustrates an encoder and decoder pair. Referring to Figure 34, an 
encoder/decoder pair is shown containing context models (CM), 
probability estimation models/machines (PEM) and bit generators (BG), 
along with a run count reorder unit, interleaved word reorder illut and a 
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shifter. Each of these is well-known in the art. For a description, sec U5. 
Patent Nos. 5,381445 and 5p83,500, assigned to the coiporate assignees of 
the present invention and incorporated herein by reference. 

The rate measiiremcnt must be explicit if the decoder cannot 
5 measure it at the same place. For instance, the rate measurement is 
pro\'ided to the decoder as part of the compressed code stream, for 
example. 

Another option for rate management illustrated as the smaller 
circle (position 2 in Figure 34) is to count the start of interleaved words in 

10 the encoder. In another embodiment, this is performed after the bit 
generation stage (position 4 in Figure 34). Because the encoder arwi 
decoder start a codeword at the same time, implicit signaling of the rate 
may be used. The counting may be performed with counting hardware 
that comprises a register and an adder Aat adds the codeword lengths 

1 5 and determines the average codeword length. Hardware to perform the 
coimting and determirung average numbers of bits is well-knoivTi in the 
art and is shown in Figure 34 as blodc 3401. It would be apparent that this 
block may be used to take similar measurements at other locations in flie 
system (e.g., positions 1, 2, 3, 4, on bofli encoder and decoder). 

20 Other options would be to count the size of completed codewords 

after the bit generator, and before the interleaved word reorder unit 
(position 3 in Hgure 34), or to determine the amount of data actually 
written to disk (position 1 in Figure 34). 
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Rate measurement can be implicit both the encoder and decoder 
perform the same rate determination calculation. For example, the 
encoder and decoder could accumulate the average size of a codeword 
each tame a new codeword is started. This is represented by position 4 in 
Figure 34. (The actual size cannot be used, since the encoder does not 
know the size xmtil Ae end of Ae codeword). If the R-codes used in the 
core vary in size from R2(0) through R2(7), the average codeword size 
varies from 1 to A5 bits. If probability estimation works well, using the 
average should be very accurate. In other cases, the differences between 
the minimum and the maximum codeword lengths versus the average 
are typically not so great, so the estimate should still be useful The 
average size of a Rz(k) codeword is bits. 

The goal may be that in almost all cases the most important data 
will compress well, and no quantization (0=1) will be required. Only 
"pathological" images will require quantization (Q>1). Including the 
quantization feature, however, can guarantee that the sjrstem will rurt 
break on pathological images. 

Aivother benefit of encoder rate control is that the encoding of less 
important data can be stopped when the maximum bandwidtfi Is 
exceeded. This increases the speed of encoding, and decreases the total 
time to ou^ut data (e.g., decrease the total time to print). 

Keeping track of Ae effects of quantization changes (tfie value of 
Q) is important For example, tfie definition of the largest coefficient in a 
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group of coefficients needs to be consistent when the quantization 
changes. Also, the reconstruction of quantized coefficients (when 
bitplanes are discarded) needs to take into account the number of 
- discarded bitplanes for best xesults. 

5 

High>Speed Parallel Coding and Context Model 

The entropy coding portion of the present invention comprises 
two parts. First, high-speed coding cores, operating in parallel, provide 
probability estimation and bit generation. Second, a context model 

1 0 provides the contexts used for coding. 

The number of cores required to achieve the desired speed is 
application dependent. 

The other part of the entropy coding system is the context model 
for the coefficients of the present invention. There are a large number of 

1 5 trade-offe possible in implementing the context model. In one 

embodiment, the present invention provides a context model with low 
hardware cost that provides parallelism to support the use of the high- 
speed parallel coders of the present invention. Embodiments of tiic 
context model are described above. 

20 Although only the context model for wavelet coefficients is 

described herein, the present invention is not limited to context models 
that only support wavelet coefficients. For instance, if a bitplane coding 
mode is desired for binary or spot color images, an additional omtext 
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node], such as described in JP.A-1M4484 . 

• • • ' 

entitled "Cosvpresslon and Decompression wiflx 
^Vavelet £^-3e and Binsj^* Style Inducing Qu2nis2ationl>y Device- 
DependsnlTziser can l5e used. 



PcT&VeJism 

In one embodiment, four high-speed coding cores zre used to 
1 0 encode/decode eight bits per coeffident where coefiidents rsnge from 8 
to 12 bits (13 if e reveaible color space is used). In one embodiment, a 
core is assigned to eadi of &e foiu components, simplifying parallelism 
and data flow. Each coefficient can t2se tip to 16 cydes for 
encoding/encoding bits Qnduding decisions for bok-ahead, etc). 
15 The present invention maintains the cores for each component in 

synCf even if some cores are idle because of their sucressful look-ahead or 
another core is handling a sign bit after a first "on" biL The total time for 
running th^ context model will vary depending on the data, specifically 
the effectiveness of look-ahead/ and to a lessor extend the ^locations of 

20 first ''on" bits. 

Whereas many alterations and modificatioztf of the present 
invention will no doubt become apparent to z person of ordinary skill Ja 
the art after having read the foregoing description, St is to be omderstood 
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that the particiilar embcdimcnt shown and described by way of 
illiistration is in no way intended to be considered limiting. Therefore, 
references to details of the varioiis embodiment are not intended to limit 
the scope of the ckims which in themselves redtc only those ieatures 
regarded as essential to the invention* 
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1 . A method comprising the sieps of: 

di\'iding a coefficirat into most important data and less important data; 
5 sending the most important data to a context model for coding immediately in 

coefBcient order, 

storing tiieless important data and aphirality of n£naling Wts in memory, and 
after coding most important data of all coefficwqjs in the set of coefficients, 
coding the less important data and embeddinA^ord^ based, in part, on the plurality 
10 of signaling bits. f 

2. The method defined in Qaim 1 wherein the signaling bits comprise a 
first bit and a second bit 

15 3. The method defined in Qaim 1 wherein a first of the signalling bits 

indicates if the first bit of the less important data of the coefficient is the head or tail 
bit, and a second of the signalling bits indicates that the first bit of the less important 
data of the coefficient is a head bit 



20 



4. The method defined in Qaim 1 wherein the signaling bits are stored 
adjacait the less important data. 

5. A forward transform comprising: 

an input buffer having an mp\A couple to receive input data and 



j 
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5 
B 



3 £rst Ej\d second outputs to transfer even and odd samples; 
< a first level transfonn unit coupled to receive fte even and odd 

sasiples and generate coeffidents, wherein horizontal low pass and 
vertical high pass coefBdents-sie xm^ots of-flttibrwaid transfonn 
7 a meaioiy having a first input coupled to receive ss coefficients 

B generated by the first level transfonn the first level transform unit and a 
9 second input to receive ss coefficients from higher level transform 
10 filtering; 

It an order uiut ha\ing a first input coiipled to &e memory to order 

12 ss coefficients for higher Ic\'els of filtoing; end 

13 a first filter uiu*t coupled to the order unit to apply a plurality of 

14 transform levels, wherein the filter unit performs a higher level 

15 transfonn ss coeffidents receWed bom Ae order unit wherein tiie filter 

1 6 unit generates ss coeffidents values that are fed back to the second Input 

17 of the memory and the second input fte order unit 

1 ^' The forward transform defined in Oaim 5 wherein the first 

2 level transform operates on 2x2 blocks of input data. 

i .• 

1 7. The forward transfonn defined In Oaim 5 wherein the first 

2 level transfoim comprises: 

3 a second filter unit to perform a first level horizontal transform^ 

4 wherein the second filter xmit having a first ou^ut and a second ou^ut; 
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5 a first rj^gle deky coupled to the fiist output of the second filter; 

6 a second single delay coi:Epled to the second ou^ut of the second 

7 filter; 

8 a doi:ble delay coupled to &e second ou^ut of the second Sltez 

9 a first multiplexer (MUX) coupled to receive the outputs of the first 
1 0 single delay and die dotible delay; 

n a second MUX coupled.to receive an output of the first filter unit 

12 and En output of the second single delay; and 

13 a third filter unit coupled to receive ou^uts from the first and 

14 second muxes and to perfonn a first level vertical transfonn. 



8. An apparatus for compressing an image, said apparatus 




5 workspace memoiy is the same size as the image and flic compressor 

6 uses the workspace memoiy for encoding the image using coefficients 

7 that are larger than the pixels in the image. 



1 9- A method for coding information comprising most 

2 important data and less important data, said medxod comprising the steps 

3 of; ^.^--^-MMm. 

4 coding the most importan t data; 
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5 coding the position of the first bit pknc in the less impoxtsnt data 

6 for each coefficient &at is not comprised entirely of zero head bits; 

7 coding each bit plane of less important data that does not entirely 

8 comprise of zero head bits, 

1 The method defined in Qaim 9 wherein the information 

2 comprises vravelet coefficients. 

1 The method defined in Qaim 9 wherein the step for coding 

2 the position of the first less important bit plane comprises perfonning a 

3 look-ahead over the entire bit planes of less important data. 

1 12. The method defined in Qaim 9 wherein the step of coding 

2 the most important data comprises the steps ofc 

3 for each tree, 

4 coding the ss coeffideni; 

5 performing a look ahead for Ae most important data; and 

6 for each ncm-ss coefficient, 

7 coding a head or tail bit for each bit plane with data, 
B and 

9 coding a dgnbit if the cx>efficient is not zero. 



1 13. The method defined In Oaim 12 wherein &e look ahead 
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2 comprises e tree look ahead, and the step of perfonning the loolc ahead 

3 comprises coding the ss-coefndents and coding the first zero bit plane 

4 with non-zero head bits for tiie whole tree. 

T 14. The mefliba defined In Claim n-vherSn^e-inost 

2 important data is processed one wavelet tree at a time. 

1 15. The method defined in Claim 9 wherein the lookahead 

2 determines bit planes that comprise all zero head bits for aU ixon-ss 

3 coefficient in the wavelet tree. 

1 16. The method defined in Claim 15 further comprising the 

2 steps of identifj-ing the first ^^plarie to code fr^dividujlly. 

■■■■ ■ --"^ r- 

\ 17. The method defined in Claim ^6 whereih the step of 

2 identif>'ing the first bit plane to code indiwdually comprises indicating all 

3 non-ss coeffidents of the second decomposition are zero using a first bit 

4 and indicating aB non-ss coeffidents of the first decomposition are zero 

5 using a second bit 

^ 18. ■ The method detoed in Claim 9 wherein the st^ of coding 

2 the most important data comprises Ae foflowing steps: 

3 for eadi tree 
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4 coding the ss coefficient; 

5 perfonning t lookahcad to detcnnine bitpknes that are aJl 

6 2ero head bits foi all non-SS coefficients in said each tree; 

7 deteiminlng if the most important data -of ihe entire tee is 
B zero; 

B if the most important data for the entire tree is not zero 

10 then, 

^1 for all coeffidents in fite tree, 

^ 2 coding bits of the current coeffident for all 

1 3 bitplanes, wherein the current coeffident is the first non-ss coeffident in 

\A the tree and starting \viA the first bit plane that contains data; 

^ 5 coding the sign bit if ihe current coeffident is 

16 not zero. 

1 19. The method defined in Qaim 9 wherein the step of coding 

2 the less important data comprises the steps of: 

3 for each tree; 

4 . for cadi coeffident; 

5 pexfozining a lookahead if at ^ start of a lookahead 

6 interval; 

7 coding a head or tail bit if the lookahead is not active; 
6 and 

9 coding a sign bit if Ae first on bit has occurred and the 
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1 0 lookzheed is not active. 

1 20. An appaahss for coding tifonnEtion comprising most 

2 Importsnt ceta 2nd less xmportsnt dits^ said apparatus rompzising: 

3 means lor coding the most important data; 

4 means for coding Ac position of the first bit plane in the less 

5 important data for each coefficient that Is not comprised entirely of zero 

6 headbits; 

7 means for coding ea A bit plane of less important data that does 

8 not entirely comprise of zero head bits. 

1 The apparatus defined in Claim 20 wherein the information 

2 comprises v^velet coefficients. 

1 22. The apparatus defined in Oaim 20 wherein Ae means for 

2 coding the position of the first less important bit plane comprises meaj« 

3 for perfoiming a look-ahead over the entire bit planes of less important 

4 data. 

1 23. The apparatus defirttd in Oaim 20 wherein flie means for 

2 coding the most important data comprises: 

3 means for coding the SS coefficient for each tree; 

4 means for performing a took ahead for the most Snportartt data for 
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5 each tree; 

e means for coding a head or tail hit for each bit plane "with data for 

7 each non-SS coefficient for each tree; ar\d 

8 means for coding a slgn'bit if the coeffident is not zero for each 

9 Jicn-S5 coeSdent for each -tree. 

1 24- The apparatus defined in Qaim 23 wherein the look ahead 

2 comprises a tree look ahead, and the mear« for performing Ae look 

3 ahead comprises means for coding the SS-coeffidents and meara for 

4 coding the first zero bit plane with rw>n-2ero head bits for the whole tree. 

1 25. The apparatus defined in Claim 23 wherein the most 

2 important data is processed one wvdet tree at a time. 

1 26. The apparatus defined in Claim 20 wherein the means for 

2 performing the lookahead determines bit planes that comprise all zero 

3 head bits for all non-ss coeffident in Ac wavdet tree. 



\ The apparatus defined in Claim a^fuilher comprising 

2 means for identifying the first bit plane to code individually. 

1 The apparatus defined in Claim 27 wherein the means for 

2 identifying Ae first bit plane to code Individually comprises means for 
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3 indiczting all non-ss coefficients of the second decomposition Ere zero 

4 using z first bit end means for indicating afl non-ss coefficients of tiie first 

5 decomposition are zero vsing a second bit 

1 29. Tne apparatus defined In Qate lOvrherdn Ae means for 

2 coding the most important data comprises: 

3 means for coding the SS coefficient for eadi tree; 

4 means for perfonning a lookahead to detennine biplanes that are 

5 all zero head bits for all non-SS coefficients in said each tree; 

6 means for determining if the most importaixt data of the entire 

7 tree is zero for each tree; and 

8 means for coding bits of the current coefficient for all biiplanes for 

9 dl coefficients in the tree if the most important data for the entire tree is 

10 not zero, wherein the current coefficient is Ae first nonrsscoeffident in 

11 the tree and starting vdth the first bit plane that contains data; 

,2 aveans for coding the sign bit if the current coeffident is not 

1 3 zero for all coeffidents in the tree if the most important data for the 

14 entire tree is not zero. 

1 30. The apparatus defined in Oaiir. wherein the means for 

2 coding the less important data comprises: 

3 means for performing a lootehead for each coeffident for each tree 

4 if at Restart of a lookahead interval; 
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5 means for coding z head or tail bit for each coeffident for cadi tree 

6 if the lookahead is not active; and 

7 means for coding a sign bit for eaA coeffident for each tree if fi\c 

8 first on bit has occurred and the lookahead is not active. 

^ A method for m-ary coding of information, said method • 

2 comprising the steps of: 

3 examining a predetermined number of coeffidents; 

4 entropy coding all of the head bits one per cyde tmtil all head bits 

5 in the predetermined nurriber of coeffidents are coded; 

6 coding the sign and tail bits of the predetermined nimiber of 

7 coeffidents in the same cyde. 

t An integrated drcuit (IC) diip comprising: 

2 a pixel data interface to transfer pixd data between the IC diip and 

3 memory; 

4 a reversible wavelet transform coupled to the pixel data interface 

5 to transfer information to and from the memory via the pixel data 

6 interface; / 

7 a context model coupled to the reversible wavelet transform to 

8 provide contexts for coding the data provided therefirom; 

9 an encoder to encode coeffidents generated by the reversible 

1 0 wavelet transform based on contexts provided by Hie context model 



-122- 

t The IC defined in Qaim 32 harther comprising a coeffident 

2 data interface coupled to transfer coefficients from the trawform to the 

3 meinoiy without coding. 

1 The IC defined in Claim 32 wAerein fte coefSdenl data 

2 interface transfers coeffidcnts from memory to the context model for 

3 encoding. 

t The IC defined in Claim 32 further comprising a coded data 

2 interface for providing entropy coded data to memory. 

1 The IC defined in Claim 35 further comprising a decoder to 

2 decode encoded data. 

1 The IC defined in Claim 3^ furtiier comprising a coded data 

2 interface to provide the decoder with entropy coded data for decoding. 

t The IC defined in Claim 32 further con^rising a reversible 

2 color space Converter coupled between the pixel data interface and the 

3 reversible v.-avelet trartf form to perform reversible color space 

4 conversion. 



39. A decoder for decoding coded data, said decoder comprisingr. 
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2 Bt least one bit gcnciEtor coupled to receix^e the coded data and to 

3 decode the coded data based on a probability estimation, vrherein said at 

4 least one bit generator generates a decoded bit from a current line; 

5 a probability estimation machine coupled to said at least one bit 
B generator to provide the^rol>Ebility estimate based on flie decoded^bit 
7 from the current line; 

B a context model coupled to the probability estimation madune to 



9 provide a plurality of contexts to the probability estimation machine 

1 0 based on a partial context address^ wherein the probability estimation 

1 1 machine selects among the plurality of context based on the decoded bit 

t 40. ^6 decoder defined in Qaim 39 further comprising a same 

2 tddrtss indicator coupled to the probability estimation madune to detect 

3 addresses that have already been read indicating that Iht data is already 

4 available in the probability estimation machine, wherein the same 

5 tddxess indicator generates an indication to the probability estimation 

6 machine indicating that the data is already in a probability estimation 

7 machine based on the partial context being addressed* 

1 The decoder defined in Claim 39 further comprising a 

2 plurality of shift registers coupled to the context znodel to provide the 

3 partial context address. 



1 

2 
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42. The decoder defined in Qaim 39 fuithei comprising e line 
buffer coupled to provide e plurality of above lines. 



1 43. A context iriodel coinprising: 

2 E first bit plane unit coupled to receive less important data and 

3 n^ost important data to determine whiA bit planes have data in them, 

4 wherein the first bit plane unit generates an indication of the bU plane 

5 ^^'ith the first on bit for the entire coding unit for use when processing the 

6 less important data; 

7 a comparison mechanism coupled to receive the less important 

8 data and the most important data to generate signaling information for 

9 the less important data; 

1 0 a memoiy coupled to receive the sign bit, the most important dat* 

1 1 and an indication of the first bit plane having data, wherein the memory 
12. delays coeffidehts to provide conditioning information. 

13 E first context model coupled to the memory to pro^-ide contexts 

14 for sign bits; 

15 a second context model coiq>led to the memory End the most 

1 6 important data to provide contexts for head bits ; and 

17 the third context model coupled to the memory and the most 

18 importznt data to provide contexts fbi tail bits. 

1 44. The context model defined in dEirn 45 furfter comprising « 
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2 scptration mechanism to separate the input data into the more and less 

3 important data. 



* ^ Ihfi context model defined in Claim 44 wherein the 

2 separation medianism comprises z barrel shifter. 

1 ^* The context model defined in Qaim 45 wherein the bairel 

2 shifter shifts data based on a quantization leveL 

1 47. The context model defined in Qaim 43 wherein the 

2 comparison mechanism comprises: 

3 a first comparison unit to determine if the most important data is 

4 equal to rero to indicate that a tail bit has already occurred^ wherein the 

5 output of the first comparison unit is a tail on bit; 

6 a second comparison unit to determine whether (ht most 

7 important data is equal to orxe^ wherein an output of the second 

8 comparison imit is equal to zero when flie most important data is equal 

9 to one; and 

10 a m^iltiplexer coupled to receive the output of die second 

1 1 comparison imit and the sign bit to ou^ut a first tail bit if the select input 

12 is in a first state and to output ^ sign if the select input is in a second 

13 state. - 
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1 48. The context model defined in Qaim 46 wherein the select 

2 input conprises the tail on output of Ae first compaiison unit 

1 49. Tiie context model defined in Qaim 46 wherein a least one 

2 of &e first and second comparison imits composes * "bit comparator. 

1 The context model defined in Oaim wherein the first bit 

2 plane unit comprises: 

3 an OR gate coupled to receive a coefficient and a feedback; 

4 a register coupled to receive the output of the OR gate; and 

5 a priority encoder coupled to receive the output of the register to 

6 record the first bit plane of the coefficient that has a 1. 

1 51. The context model defined in Claim 50 wherein the register 

2 includes a reset input to reset the contents of the register at the start of the 

3 coding \mit. 

1 The context model defined in Qaim SO wherein the reset 

2 ■ input also rbeis the contents of Reregister at the start of eadi tree. 

1 A mefliod for peifbnning compression comprising Ae steps 

2 of: 

3 determining the average lengfli of codewords to identify an 
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4 encoding rate; and 

5 adjtisting a compression rate based on a desired amount of 

6 compression. 

n 54. Tnemelhod defined in Qaim 53 further comprising the step 

2 of: 

3 inaeasing an amount of quantization level if fte encoding rate 

4 indicates compression is bdow a first desirable level; and 

5 deceasing the amount of quantization if the encoding rate 

6 indicates that compression is above the second desired leveL 

t 55. The method defined in Qaim 54 wherein Ae first and 

2 second desirable levels are not the same. 

1 56. The method of Claim 53 wherein flie step of determining 

2 the average length of codewords is performed after bit generation. 

t 57. The method of Claim 53 farther comprising the step of 

2 signaling a hew compression rate to a decoder. 

1 58. The method of Qaim 53 wherein the signaling is expUdL 



59. The method of Claim 57 wherein the signaling is implidt 



60. A sj'stem comprising: 
a context model; 

a probabmty estimation macWne coupled to the context model; 
abit generator co:g>led to theprobebiHty estimation madune; and 
an encoder rate control coupled to an output of fheVit generator to 
control the encoding rate by deteimiiung average codeword length. 

61. Che system defined in QaimeO wherein an encoder rate 
control adjusts quantization. 

6^- The sj-stem defined in Oaim 60 comprising a signaling 
block to signal a decoder regarding a new quantization leveL 

63- The system defined in Qaim 60 «rther comprising a 
signalingblock to generate header data for a compressed data stream 
output of the encoder which is concatenated onto the compressed bit 
stream to indicate to the decoder a new levd of quantization. 

64. The sj-stem defined in Oaim 60 .vherdnAe encoder rate 
control stores an indication of the quantization level is necessary for 
subsequent use by fi^e decoder. 



65. 



method for processing a least important portion Of data 
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2 bitplanes for a set of tnmsfonned coefadents, and method comprising 

3 the steps of: 

4 reading z first j>ortion of dttz from memory; and 

5 writmg-a second amount of data ^eater than the first to memory 

6 while reading the first amount of data to compensate for transformed 

7 bitplanes vrith less data in the lower order biplanes of the set of 

8 transformed coefficients. 

1 66. The method defined in Claim 65 wherein the bits required 

2 to store the least important chunk coefficients increase as the bit plane 

3 number decreases. 
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coefficients. 

Figure 31 Hlustrates a band buffer 3101 of page 3100. Band buffer 
3101 comprises coding units 3101A-D. Coding units 3101A and 3101B act 
as a workspace iorihe decompressor by storing coefficients. Coding unit 
3101C stores pixels to be output to the printer (or channel), while coding 
unit 31 DID acts as workspace for the decompressor by storing the next 
pixels. 

The portions of band buffer 3101 can be used in rotation as the 
entire page 3100 is printed. For instance, for the next coding unit, the 
pixels in coding unit 3101D are the pixels to be output to the printer. 
When that occurs, coding units 3101B and 3101C wiD be used as 
workspace for the decompressor to store coefficients. Also at tfiat time, 
coding imit 3101A will be used as the workspace for Ae decompressor to 
store the next pixels to be output to the printer. 

In the present invention, the coefficients are bigger than pixels. 
Therefore, twice as much memoiy is allocated to the workspace memoiy. 
In an alternate embodiment, the bands may be divided into more or less 
coding imits. For instance, in one embodiment, flte bands may be 
divided into eight coding imits cadi. 

Memoiy Bandwidth 

Together, the pixel data interface and Ae coefficient data interface 
manage the band buffer memoiy efficiently. If fast page mode EfRAM, 
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Extended Data Out (EDO) DRAM, or other memories which favor 
consecutive accesses is used, then Aese interfaces transfer data from 
consecutive addresses in long enough bursts to make efficient use of the 
potential bandwidth of the memory. Some small buffers may be needed 
to support burst accesses to consecutive addresses. 

Figure 32 illustrates a timing diagram of decoding that illustrates 
concurrent memory access requirements. Referring to Hgure 32, fte 
bandwidth required for decoding is as follows. Recall that in one 
embodiment, a 2 MHz pixelndodc, a 8 MHz component-dock and a 32 
MHz decoder clock are vised, and that the print engine requires 1 
byte/component-dock, flie transform reads 2 bytes per coeffident ai\d 
writes 1 byte per component If flxe transform is performed in half flie 
coding imit time, it would require 6 bytes/component-dock. The speed 
of the transform is limited by memory bandwidth, not computation 
time. If a bandwidth of 24 bytes/component<lock is available, the 
transform could be computed in one-eighth of the coding unit time. The 
transform may require additional bandwidth if external memory is used 
for seams. In one embodiment, the decoding of coeffidents requires 
writing two bytes per component-dock for Ae most important part of 
coded data. Decoding requires a read and a write of one byte per 
component-dodc for ca A biplane of the less important part of the coded 
data. Note this may be less in some embodiments. Bandwidths of 4 bytes 
per component-dodc and 24 bytes per component-dodc respectivdy 
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would be required if both operations took half the coding unit time- 
Additional bandividth might required if external memory was used for 
context seam information. 

In one embodiment, the maximum burst mode transfer rate is 4 
memory accesses per component-dock (one access per coder-dock). 
Therefore, with a 32-bit data bus, the.maximum transfer rate is somewhat 
less than 16 bytes /component-dock. With a 64-bit data bus, the 
maximum transfer rate is somewhat less than 32 bytes/component-dodc 

Reduction of LIC Memory Bandwidth Requirements 

Each bit of each coeffident in the UC requires a read and a write of 
external memoiy during decoding. (Encoding only requires a read). 
These memory accesses accoimt for the majority of the memory 
bandwidth required. In one embodiment, instead of storing cadi IIC 
coeffident in 8 bits, the present invention stores the coeffidents using 
less than 8 bits when possible to reduce the bandwidth reqiiirements. 

Table 8 shows how mudi memory is required to store UC 
coeffidents for the decoding of cadi bitplane. Referring to Table 8, when 
doing Ae MIC, one bit per coeffident is written, whidi is the tail*on bit 
What is written for bit plane 5 is read back for bit plane 4: 2-3 bits that 
indude, the tail-on bit, what bit 5 was and if bit 5 was a 1, (hen a sign bit 
The percentage indicates for cadi bit planes which percentage of 
coeffidents are participating. This may be made dearer by looking at 
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Figure 13B. Referrir\g to Figure 13B, bitplane 5 has coefficients from all 
subbands participating because all coefficients from the DDI to the DS4 
and SD4 subbands have data in bitplane 5 (as indicated by shading). 
Bitplane 0 has coefficients only from tiie DDI subband* As shown in 
Table 8, both bitplanes 4 and 5 have coefficients from all sxibbands, so Ae 
percentage is 100%, while bitplane 0 has only 25% of the coefficients (in 
the DDI subband). As more decoding occurs, some bitplanes are 
completed before bitplane 0 is reached. 

Table 8 - Bits Required to Store UC Coefficients While Decoding 



percent of coefficients 
bitplane ^ MSE alignment 

write read bits /coefficient contents (write/read) 

—•5 1 tail-on — /100% 

5 4 2—3 tafl-on, bit 5, sign? 100%/100% 

4 3 3-4 tail-on, bits 4^, sign? 100%/99% 

3 2 4—5 tail-on, bits 3..-5, sign? 99%/96% 

2 1 5—^ tail-oa bits Z.J5, sign? 96%/82% 

1 0 6—7 tail-on, bits l.JS, sign? 82%/25% 

0 — + 7—8 tail-on, bits D..5, sign? 25%/— 

•Written during processing most important chimk (MIC). 
tRead during inverse trar\sform. 

In Table 8, at the start of decoding, no decoding of bitplanes has 
occurred; therefore, only one bit (bit/coefficient) of every coefficient is 
read to determine if its a head or taiL As decoding continues, the 
number of bits per coefficient increases. 

Figiare 33 shows how droalar addressing can be used to handle 
writing data that is larger than flie data read- This occurs because tfie 
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results of the processing produces more bits to write than were origir\ally 
rcHd. Referring to Figure 33^ the process begins by writing 1 bit per 
coefficient which is 1/8 of the memory space. Subsequently, 1 bit per 
coeffident is read, while 2-3 bits per coefficient are written. Then, the 2-3 
5 bits per coeffident are read, while 3-4 bits per coeffident are writteiv This 
continues until all the data is done. 

There are some options to simplify the hardware implementation* 
Instead of always using the minimum number of bits, perhaps only 1, 2, 
4, 6 or 8 bits would be used which would cause one bit to be wasted for 

1 0 some sizes. Space for the sign bit could always be used, even if the sign 
bit was not coded in the UC or not known yet 

An option that would further reduce memory bandwidth would 
be to not store the tail-on bit when it was not necessary. For exan^le, 
when writing bitplane 0, there are 6 bits which are either head or tail bits. 

15 If any of these bits are non-zero, the tail-on must be true, and tfiere is no 
need to store the tail-on value, and the sign bit can be stored as the 
seventh bit. 

Memory bandwidth for the most important chimk (MIC) may also 
be reduced by variable length storage methods. Just using the minimum 
20 niomber of bits instead of always using 8 bits per coeffident would result 
in a savings. Storing the 6-bit look ahead values (as in Figure 15) iiistead 
of zero coeffident bits woidd result in an even more effident use of 
memory. 



